"PS4 can render 1080p/60fps with room to spare” -Kojima

This topic is locked from further discussion.

Avatar image for miiiiv
miiiiv

943

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 5

#351 miiiiv
Member since 2013 • 943 Posts
@SamiRDuran said:

ps4 is weak outdated hardware it cannot render any visually impressive game at 1080p 60fps.

So far I'm not impressed at all by the "next gen" games either. Killzone SF is the best looking ps4 game and it has unimpressive scale and draw distance (backdrops are not draw distance) mediocre textures, absence of tessellation and it still only runs at 28-50 fps (about 38fps average) in the campaign and 36-60 fps (about 45fps average) in MP.
Yes, the ps4 is the most powerful console this gen and newer games are bound to get better but high-end PCs are way ahead already from the start of this gen.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#353  Edited By ronvalencia
Member since 2008 • 29612 Posts

@tormentos:

Rebellion is a known AMD Gaming Evolved developer team and it would be unwise to dismiss their statements on AMD GCN related topics.

Your mind could not grasp and unable to resolve all the developer's statements. Rebellion has stated the specific conditions for thier X1 vs PS4 conclusions. If you go outside of Rebellion's stated conditions, you get the current results.

Xbox One has a very narrow optimal performance path when compared to iBuyPower's Steam machine with 179GB/s 2GB GDDR5 from R9-270 i.e. 2 GB GDDR5 has a greater optimal performance path than 32 MB ESRAM.

5GB to 6GB of GDDR5 RAM enables PS4 to even greater optimal performance path than 2GB GDDR5. There are other issues on why PS4 gets beaten by higher GCNs with just 2 GB GDDR5

---------

X1's has a T intersection that splits the paths for CPU-to-GPU and CPU-to-main memory traffic. Your debunking on this matter is BS.

PS; I now have an AMD Radeon HD R9-290X (44 CU) i.e. X1 vs PS4 hardware wars are just LOL

PS4's primary high level graphics API is GNMX and it's claimed to be is similar to Direct3D, but minus the legacy issues. GNM is it's low level APIs and PSSL is claimed to be similar to MS HLSL

For PS4, it would make more sense for AMD to recycle their Direct3D and Mantle software investments than OpenGL. Mantle still supports MS HLSL and PS4's PSSL is similar to HLSL.

From DICE/APU13 lecture, PS4's Battlefield 4 rendering engine is similar to Mantle version.

Avatar image for deactivated-57d8401f17c55
deactivated-57d8401f17c55

7221

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 3

#354  Edited By deactivated-57d8401f17c55
Member since 2012 • 7221 Posts

Bwahaha, Xbox 720p strikes again.

MGS5

Avatar image for GrenadeLauncher
GrenadeLauncher

6843

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#355  Edited By GrenadeLauncher
Member since 2004 • 6843 Posts

Xbone confirmed dogshit.

Don't worry, lems, you can hide the pain by shitposting screencapped images of Infamous from Youtube using the 144p setting. Might want to be quick though, those will have a short lifespan.

Avatar image for emgesp
emgesp

7848

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#356 emgesp
Member since 2004 • 7848 Posts

@Chozofication:

Dis gun be good

Avatar image for deactivated-57d8401f17c55
deactivated-57d8401f17c55

7221

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 3

#357 deactivated-57d8401f17c55
Member since 2012 • 7221 Posts
@GrenadeLauncher said:

Xbone confirmed dogshit.

Don't worry, lems, you can hide the pain by shitposting screencapped images of Infamous from Youtube using the 144p setting. Might want to be quick though, those will have a short lifespan.

MS tried their hardest to make a console where people could have all their stupid shit in one place, an even bigger influence on advertisements, cable and other stupid shit having nothing to do with games. People will still buy it because they also counted on the people who want to buy it only care about trash like CoD and fifa anyways, but I can still get a kick out of it. The MS idiot box. :]

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#358 ronvalencia
Member since 2008 • 29612 Posts

@Chozofication:

MS designed another Xbox 360 style memory model with 2012 parts and Xbox 360's TDP levels.

The big change is with Sony and went for AMD's strongest APU with recent PS3 TDP levels.

Avatar image for StormyJoe
StormyJoe

7806

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#359 StormyJoe
Member since 2011 • 7806 Posts

@highking_kallor said:

@StormyJoe:

Your awesome

Umm... thanks?

Avatar image for tormentos
tormentos

33784

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#360  Edited By tormentos
Member since 2003 • 33784 Posts

@ronvalencia said:

@tormentos:

Rebellion is a known AMD Gaming Evolved developer team and it would be unwise to dismiss their statements on AMD GCN related topics.

Your mind could not grasp and unable to resolve all the developer's statements. Rebellion has stated the specific conditions for thier X1 vs PS4 conclusions. If you go outside of Rebellion's stated conditions, you get the current results.

Xbox One has a very narrow optimal performance path when compared to iBuyPower's Steam machine with 179GB/s 2GB GDDR5 from R9-270 i.e. 2 GB GDDR5 has a greater optimal performance path than 32 MB ESRAM.

5GB to 6GB of GDDR5 RAM enables PS4 to even greater optimal performance path than 2GB GDDR5. There are other issues on why PS4 gets beaten by higher GCNs with just 2 GB GDDR5

---------

X1's has a T intersection that splits the paths for CPU-to-GPU and CPU-to-main memory traffic. Your debunking on this matter is BS.

PS; I now have an AMD Radeon HD R9-290X (44 CU) i.e. X1 vs PS4 hardware wars are just LOL

PS4's primary high level graphics API is GNMX and it's claimed to be is similar to Direct3D, but minus the legacy issues. GNM is it's low level APIs and PSSL is claimed to be similar to MS HLSL

For PS4, it would make more sense for AMD to recycle their Direct3D and Mantle software investments than OpenGL. Mantle still supports MS HLSL and PS4's PSSL is similar to HLSL.

From DICE/APU13 lecture, PS4's Battlefield 4 rendering engine is similar to Mantle version.

Your onw Diagram owned you the only 30GB line on the xbox one is from DDR3 to the CPU,move engines and GPU that line will allow small GPU data to pass,the other connection the direct one from GPU to memory will only be able to use 38GB/s because the 68GB is shared with the System one which is 30GB/s.

The fact that all you do is pro-xbox one arguments show how biased you are,and even after been proven wrong and after all your arguments fall to the ground still you refuse to admit been wrong,and hide your self on PC on topics where PC are not even included,so to save your xbox one you need your PC..lol

http://www.konami.jp/mgs5/gz/en/products/compare.html

Here another crack in your argument,MGS GS 1080p 60FPS on PS4 720p 60FPS on xbox one...

But but ESRAM,but but move engines,but but jit compression,but but PRT will not work on PS4..lol

Don't worry maybe your theories will pay off some day,for now no they haven't and you can quote all the developers to like,hell Kujima say the difference wasn't big,and now we see again how the PS4 version is superior..hahahahaa

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#361  Edited By ronvalencia
Member since 2008 • 29612 Posts

@tormentos:

LOL, hotchips X1's diagram shows the following path

1. CPU/FSL to GPU

2. CPU/FSL to main memory

3. CPU to eSRAM i.e. the black line.

101: Hardware coherent memory = automatic updates to all affected memory data = keeps the memory data views the same for all processor nodes. Advantage: minimise the plumbing code for programmer/software side processing overheads.

IF the CPU makes a change, the changed data has to be updated to all locations and X1 has 30 GB/s to do it.

PS4 has a lower coherent memory bandwidth i.e.

1. CPU/NB updates TO memory = 20 GB/s

2. CPU/NB updates TO GPU = 10 GB/s. This would limit coherent memory bandwidth since this is the lowest factor in this coherent memory update system. It still faster than PCIe 16X version 2.0.

There are no other AMD APU patents shows your fan-fiction i.e. PS4's would be sporting similar AMD coherent hardware setup.

Again, CPU makes a change and the coherent hardware makes the required memory updates.

The 101 for memory coherency http://en.wikipedia.org/wiki/Memory_coherence

Avatar image for btk2k2
btk2k2

440

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#362  Edited By btk2k2
Member since 2003 • 440 Posts

@ronvalencia said:

Microsoft and myself has the done the math and 16 ROPs are sufficient for 150 GB/s level memory bandwidth i.e. the bottlenecks are somewhere else. You are ignoring the fact that 7950 BE's 32 ROPs at 850 Mhz base clockspeed is superior to 7850's 32 ROPs at 860 Mhz and it's mostly due to 7950 BE's superior memory bandwidth.

7770 doesn't have the option to improve with 32 MB ESRAM memory bandwidth booster and dual rasterizer units.

Rebellion's POV mirrors my POV. The prototype 7850 with 12 CUs and boosted 153.6 GB/s memory bandwidth shows what 1.3TFLOPs GCN can do. Remember, 153.6 GB/s limits 32 ROPs.

The main difference between the prototype 7850 vs X1 is it's handling of non-tiling 3d engines with the prototype 7850 handles legacy 3d engines better than X1.

Part of the X1's unlocking package is to enable X1's second rasterzier unit and better middleware support for eSRAM. Your 7790 is not equipped with 153.6 GB/s VRAM.

If you notice, 7790's dual rasterzier units feeds into the CU blocks i.e. 7790's has less bottleneck at this point than 7770. The rasterzier unit is one the important hardware blocks that makes a DSP like** solution into a GPU.

**Also, AMD departed from the DSP's in-order processing model with an out-of-order (for wavefront/MIMD instruction) processing model. AMD's GCN Wavefront is just MIMD instruction issue, which is an evolution over the SIMD instruction issue.

In CU terms, R9-290's 40CU vs 7970's 32 CU is minor but AMD boosted rasterizer (2 vs 4) and 64 ROPs (with memory bandwidth increase) i.e. dual rasterizer units would have bottlenecked the 40 CUs.

To increase throughput, you have to minimise the bottlenecks at the front-end and back-ends (memory writes) i.e. this is what the prototype 7850 shows.

In real world scenarios the 7950, 7970 and 7970 Ghz Edition are ROP bound as can be shown by the vantage pixel fill benchmark. Since the 7950 runs at 850Mhz with 240GB/s of bandwidth it is trivial to deduce that 16 ROPS at 850 Mhz with 120GB/s of bandwidth will also be ROP bound. It is also the reason why AMD doubled the ROPS to 64 on the 290 series video cards despite having a very modest 11% increase in bandwidth (7970 Ghz vs 290(x))

The lack of ROPS in the X1 is going to hurt it when trying to achieve 1080p, as has been shown. The 32MB of ESRAM is not large enough to use for render targets in deferred rendering engines at 1080p. You can hit 1080p with forward rendering but that is without FSAA assuming you are using 32bits/pixel.

Maths for Forward Rendering

Front Buffer: 1920x1080(resolution)x32(bits per pixel) = 66355200b = 7.91MB.

Depth Buffer: 1920x1080x32 = 7.91MB

Back Buffer: 1920x1080x32 = 7.91MB

Total = 7.91x3 = 23.73MB.

At just 2x FSAA you would need:

Front Buffer: 1920x1080x32 = 7.91MB.

Depth Buffer: 1920x1080x32x2(FSAA) = 15.82MB

Back Buffer: 1920x1080x32x2(FSAA) = 15.82MB

Total = 7.91x3 = 39.55MB.

With deferred rendering it is different because it puts the render targets into a G Buffer and then applies lighting. For BF3 a 1080p scene with 4xMSAA requires 158MB. If we divide by 4 we come out with a no AA 1080p scene requiring around 39.5MB assuming that maths is correct, or close to. This might be Frostbite 2 but I cannot imagine it is much different for Frostbite 3 or other deferred rendering engines.

So both the 16 ROPS and the ESRAM are a bottleneck against the Xbox One running games at 1080p. It does not mean it is impossible but it does mean that for the X1 to achieve it a lot of compromises have to be made and that is exactly the case with Forza 5. Of course even if we do manage to get to 1080p we then hit another bottleneck in the Xbox One and that is the reduced shader performance. With 1080p you are throwing more pixels around the screen and that requires more GPU horse power, if you want to include additional effects on top of that you have fewer resources with which to do so further compromising on the graphics.

Ron you are correct in that the PS4 cannot fully utilise the 32 ROPS it has because the bandwidth is too low but 16 ROPS would have been a large bottleneck. A balanced solution would have been to use 24 ROPS but it is likely that the choices were 16 or 32 so they went with the overkill solution of 32, which is the smart move.

Now lets talk about the hot chips diagram.

I am looking at it now and the only mention of bandwidth to/from the CPU is from the main memory pool and that is 30 GB/s. It does not mention what the bandwidth is from GPU -> CPU and it does not mention what the bandwidth is from the ESRAM to the CPU. Because of the lack of data we cannot conclude if the Xbox One has higher coherent memory bandwidth than the PS4 as we only have the speed of one of the 3 coherent memory buses.

The other thing to note in your post about this is that you claim the 10 GB/s bus between the CPU and the GPU in the PS4 is the limiting factor, it is not as the GPU and CPU cache are tiny compared to the amount of bandwidth. That bus will not be saturated because the amount of data flowing on that bus is going to be tiny. The 20 GB/s bus between the CPU and the main memory is more likely to be an issue but compute is really not bandwidth heavy so I do not see the cache coherent bandwidth being an issue for either console. The issues for compute will be purely down to available resources, of which the Xbox One has fewer.

Avatar image for tormentos
tormentos

33784

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#363 tormentos
Member since 2003 • 33784 Posts

@ronvalencia said:

@tormentos:

LOL, hotchips X1's diagram shows the following path

1. CPU/FSL to GPU

2. CPU/FSL to main memory

3. CPU to eSRAM i.e. the black line.

101: Hardware coherent memory = automatic updates to all affected memory data = keeps the memory data views the same for all processor nodes. Advantage: minimise the plumbing code for programmer/software side processing overheads.

IF the CPU makes a change, the changed data has to be updated to all locations and X1 has 30 GB/s to do it.

PS4 has a lower coherent memory bandwidth i.e.

1. CPU/NB updates TO memory = 20 GB/s

2. CPU/NB updates TO GPU = 10 GB/s. This would limit coherent memory bandwidth since this is the lowest factor in this coherent memory update system. It still faster than PCIe 16X version 2.0.

There are no other AMD APU patents shows your fan-fiction i.e. PS4's would be sporting similar AMD coherent hardware setup.

Again, CPU makes a change and the coherent hardware makes the required memory updates.

The 101 for memory coherency http://en.wikipedia.org/wiki/Memory_coherence

So now you want to invalidate your own diagram you posted.? Hahahahaaaaaaaaaaaa

That black line isn't 30GB,the only 30GB/s line is the one that connect the GPU,CPU,DME to the coherent cache..

PERIOD it say so on the diagram very clear,i would care less about your theories..

MGS 1080p 60FPS on PS4 720p on xbox one...hahahaa more than 100% pixel difference,your arguments all fall apart the xbox one is performing badly..lol

We all know your crazy theories on PRT..hahahaaaaaaaaaaaaaa

Ron all i care is the end result i could care less about your theories this are the real life results of the gap..

BF4 900p 10FPS faster on PS4,720p on xbox one,

Ghost 1080p on PS4 720p on xbox one.

AC4 1080p on PS4 900p on xbox one,better AA and better image quality on PS4.

Tomb Raider up to 60FPS on PS4,30FPS on xbox one,worse textures,worse effects,half resolution effects and 900p cut scenes on xbox one.

And now 1080p 60 FPS on MGS for the PS4 720p for the xbox one.

Now this ^^^ is real you can use what ever excuse you seem to fit your argument it doesn't change this..lol

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#364  Edited By ronvalencia
Member since 2008 • 29612 Posts

@btk2k2 said:

@ronvalencia said:

Microsoft and myself has the done the math and 16 ROPs are sufficient for 150 GB/s level memory bandwidth i.e. the bottlenecks are somewhere else. You are ignoring the fact that 7950 BE's 32 ROPs at 850 Mhz base clockspeed is superior to 7850's 32 ROPs at 860 Mhz and it's mostly due to 7950 BE's superior memory bandwidth.

7770 doesn't have the option to improve with 32 MB ESRAM memory bandwidth booster and dual rasterizer units.

Rebellion's POV mirrors my POV. The prototype 7850 with 12 CUs and boosted 153.6 GB/s memory bandwidth shows what 1.3TFLOPs GCN can do. Remember, 153.6 GB/s limits 32 ROPs.

The main difference between the prototype 7850 vs X1 is it's handling of non-tiling 3d engines with the prototype 7850 handles legacy 3d engines better than X1.

Part of the X1's unlocking package is to enable X1's second rasterzier unit and better middleware support for eSRAM. Your 7790 is not equipped with 153.6 GB/s VRAM.

If you notice, 7790's dual rasterzier units feeds into the CU blocks i.e. 7790's has less bottleneck at this point than 7770. The rasterzier unit is one the important hardware blocks that makes a DSP like** solution into a GPU.

**Also, AMD departed from the DSP's in-order processing model with an out-of-order (for wavefront/MIMD instruction) processing model. AMD's GCN Wavefront is just MIMD instruction issue, which is an evolution over the SIMD instruction issue.

In CU terms, R9-290's 40CU vs 7970's 32 CU is minor but AMD boosted rasterizer (2 vs 4) and 64 ROPs (with memory bandwidth increase) i.e. dual rasterizer units would have bottlenecked the 40 CUs.

To increase throughput, you have to minimise the bottlenecks at the front-end and back-ends (memory writes) i.e. this is what the prototype 7850 shows.

In real world scenarios the 7950, 7970 and 7970 Ghz Edition are ROP bound as can be shown by the vantage pixel fill benchmark. Since the 7950 runs at 850Mhz with 240GB/s of bandwidth it is trivial to deduce that 16 ROPS at 850 Mhz with 120GB/s of bandwidth will also be ROP bound. It is also the reason why AMD doubled the ROPS to 64 on the 290 series video cards despite having a very modest 11% increase in bandwidth (7970 Ghz vs 290(x))

The lack of ROPS in the X1 is going to hurt it when trying to achieve 1080p, as has been shown. The 32MB of ESRAM is not large enough to use for render targets in deferred rendering engines at 1080p. You can hit 1080p with forward rendering but that is without FSAA assuming you are using 32bits/pixel.

Maths for Forward Rendering

Front Buffer: 1920x1080(resolution)x32(bits per pixel) = 66355200b = 7.91MB.

Depth Buffer: 1920x1080x32 = 7.91MB

Back Buffer: 1920x1080x32 = 7.91MB

Total = 7.91x3 = 23.73MB.

At just 2x FSAA you would need:

Front Buffer: 1920x1080x32 = 7.91MB.

Depth Buffer: 1920x1080x32x2(FSAA) = 15.82MB

Back Buffer: 1920x1080x32x2(FSAA) = 15.82MB

Total = 7.91x3 = 39.55MB.

With deferred rendering it is different because it puts the render targets into a G Buffer and then applies lighting. For BF3 a 1080p scene with 4xMSAA requires 158MB. If we divide by 4 we come out with a no AA 1080p scene requiring around 39.5MB assuming that maths is correct, or close to. This might be Frostbite 2 but I cannot imagine it is much different for Frostbite 3 or other deferred rendering engines.

So both the 16 ROPS and the ESRAM are a bottleneck against the Xbox One running games at 1080p. It does not mean it is impossible but it does mean that for the X1 to achieve it a lot of compromises have to be made and that is exactly the case with Forza 5. Of course even if we do manage to get to 1080p we then hit another bottleneck in the Xbox One and that is the reduced shader performance. With 1080p you are throwing more pixels around the screen and that requires more GPU horse power, if you want to include additional effects on top of that you have fewer resources with which to do so further compromising on the graphics.

Ron you are correct in that the PS4 cannot fully utilise the 32 ROPS it has because the bandwidth is too low but 16 ROPS would have been a large bottleneck. A balanced solution would have been to use 24 ROPS but it is likely that the choices were 16 or 32 so they went with the overkill solution of 32, which is the smart move.

Now lets talk about the hot chips diagram.

I am looking at it now and the only mention of bandwidth to/from the CPU is from the main memory pool and that is 30 GB/s. It does not mention what the bandwidth is from GPU -> CPU and it does not mention what the bandwidth is from the ESRAM to the CPU. Because of the lack of data we cannot conclude if the Xbox One has higher coherent memory bandwidth than the PS4 as we only have the speed of one of the 3 coherent memory buses.

The other thing to note in your post about this is that you claim the 10 GB/s bus between the CPU and the GPU in the PS4 is the limiting factor, it is not as the GPU and CPU cache are tiny compared to the amount of bandwidth. That bus will not be saturated because the amount of data flowing on that bus is going to be tiny. The 20 GB/s bus between the CPU and the main memory is more likely to be an issue but compute is really not bandwidth heavy so I do not see the cache coherent bandwidth being an issue for either console. The issues for compute will be purely down to available resources, of which the Xbox One has fewer.

ROP bound factors are also dependant on the color depth i.e. a lower color depth enables more ROPS units to be used.

For AMD GPUs, another case for increasing the ROP units is for MSAA, but increasing ROP/MSAA units would also need matching memory bandwidth increase. AMD's MSAA hardware are in the ROP unit

As for 16 vs 32 ROPS bound arguments, refer to http://www.guru3d.com/articles_pages/amd_radeon_r7_265_review,11.html

Radeon HD R7-265 = 1.894 TFLOPS with 179 GB/s (5600Mhz GDDR5 256bit) VRAM and 32 ROPS.

Radeon HD R7-260X = 1.971 TFLOPS with 104 GB/s (6500Mhz GDDR5 128bit) VRAM and 16 ROPS.

With roughly similar FLOPS, the difference between Radeon HD R7-265 and Radeon HD R7-260X is 10 fps i.e. Radeon HD R7-265 has a faster memory/faster memory read and writes.

R7-265's 32 ROPS doesn't equal 2X over Radeon HD R7-260X i.e. in another words it's insignificant advantage without proper matching memory bandwidth e.g. R9-280X's 32 ROPs still enable it reach 89 fps.

AMD Hawaii almost a straight "copy and paste" 7770 4X scaled. Future AMD GCNs may follow this type of scaling.

Radeon HD 7770

http://www.eurogamer.net/articles/digitalfoundry-microsoft-to-unlock-more-gpu-power-for-xbox-one-developers

ROPs are the elements of the GPU that physically write the final image from pixel, vector and texel information: PlayStation 4's 32 ROPs are generally acknowledged as overkill for a 1080p resolution (the underlying architecture from AMD was never designed exclusively just for full HD but for other resolutions such as 2560x1400/2560x1600 too), while Xbox One's 16 ROPs could theoretically be overwhelmed by developers.

....

"For example, consider a typical game scenario where the render target is 32bpp [bits per pixel] and blending is disabled, and the depth/stencil surface is 32bpp with Z [depth] enabled. That amount to 12 bytes of bandwidth needed per pixel drawn (eight bytes write, four bytes read). At our peak fill-rate of 13.65GPixels/s that adds up to 164GB/s of real bandwidth that is needed which pretty much saturates our ESRAM bandwidth. In this case, even if we had doubled the number of ROPs, the effective fill-rate would not have changed because we would be bottlenecked on bandwidth

The prototype Radeon HD 7850's 32 ROPS will would not be operating at full speed. Note that Microsoft stayed away from ROP's MSAA topics.

----------------

On X1's eSRAM issue. http://gamingbolt.com/xbox-ones-esram-too-small-to-output-games-at-1080p-but-will-catch-up-to-ps4-rebellion-games

Bolcato stated that, “It was clearly a bit more complicated to extract the maximum power from the Xbox One when you’re trying to do that. I think eSRAM is easy to use. The only problem is…Part of the problem is that it’s just a little bit too small to output 1080p within that size. It’s such a small size within there that we can’t do everything in 1080p with that little buffer of super-fast RAM.

“It means you have to do it in chunks or using tricks, tiling it and so on. It’s a bit like the reverse of the PS3. PS3 was harder to program for than the Xbox 360. Now it seems like everything has reversed but it doesn’t mean it’s far less powerful – it’s just a pain in the ass to start with. We are on fine ground now but the first few months were hell.”

Your using X1's 32 MB eSRAM like a traditional fast/large VRAM not with tiling and other tricks. Using AMD PRT/Tiled Resource+eSRAM , enables GPU's TMUs to use ESRAM instead of slower DDR3 memory. Render targets needs to be titled.

The games you mentioned are not tile friendly. The maximum potential for X1's GCN solution would be similar to prototype 7850 with 12 CU with 153.6 GB/s and that's with a very narrow or specific optimisation path. Going outside of Bolcato's narrow optimisation path will yield the current results.

There's reason why PS4 has simple 8 GB GDDR5 setup i.e. relatively easy to get a game title to run at best possible frame rates. PS4 is more forgiving than X1.

----------------

http://www.neogaf.com/forum/showpost.php?p=80951633&postcount=195

Speaking of GPGPU - we have 3X the coherent bandwidth for GPGPU at 30gb/sec which significantly improves our ability for the CPU to efficiently read data generated by the GPU.

My point with PS4 vs X1's memory coherent bandwidth is X1 has superior memory coherent bandwidth over PS4 and it's not about it's usefulness. The results speaks for themselves i.e. superior memory coherent bandwidth is a yawn feature at this time.

Avatar image for btk2k2
btk2k2

440

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#365  Edited By btk2k2
Member since 2003 • 440 Posts

@ronvalencia said:

@btk2k2 said:

@ronvalencia said:

Microsoft and myself has the done the math and 16 ROPs are sufficient for 150 GB/s level memory bandwidth i.e. the bottlenecks are somewhere else. You are ignoring the fact that 7950 BE's 32 ROPs at 850 Mhz base clockspeed is superior to 7850's 32 ROPs at 860 Mhz and it's mostly due to 7950 BE's superior memory bandwidth.

7770 doesn't have the option to improve with 32 MB ESRAM memory bandwidth booster and dual rasterizer units.

Rebellion's POV mirrors my POV. The prototype 7850 with 12 CUs and boosted 153.6 GB/s memory bandwidth shows what 1.3TFLOPs GCN can do. Remember, 153.6 GB/s limits 32 ROPs.

The main difference between the prototype 7850 vs X1 is it's handling of non-tiling 3d engines with the prototype 7850 handles legacy 3d engines better than X1.

Part of the X1's unlocking package is to enable X1's second rasterzier unit and better middleware support for eSRAM. Your 7790 is not equipped with 153.6 GB/s VRAM.

If you notice, 7790's dual rasterzier units feeds into the CU blocks i.e. 7790's has less bottleneck at this point than 7770. The rasterzier unit is one the important hardware blocks that makes a DSP like** solution into a GPU.

**Also, AMD departed from the DSP's in-order processing model with an out-of-order (for wavefront/MIMD instruction) processing model. AMD's GCN Wavefront is just MIMD instruction issue, which is an evolution over the SIMD instruction issue.

In CU terms, R9-290's 40CU vs 7970's 32 CU is minor but AMD boosted rasterizer (2 vs 4) and 64 ROPs (with memory bandwidth increase) i.e. dual rasterizer units would have bottlenecked the 40 CUs.

To increase throughput, you have to minimise the bottlenecks at the front-end and back-ends (memory writes) i.e. this is what the prototype 7850 shows.

In real world scenarios the 7950, 7970 and 7970 Ghz Edition are ROP bound as can be shown by the vantage pixel fill benchmark. Since the 7950 runs at 850Mhz with 240GB/s of bandwidth it is trivial to deduce that 16 ROPS at 850 Mhz with 120GB/s of bandwidth will also be ROP bound. It is also the reason why AMD doubled the ROPS to 64 on the 290 series video cards despite having a very modest 11% increase in bandwidth (7970 Ghz vs 290(x))

The lack of ROPS in the X1 is going to hurt it when trying to achieve 1080p, as has been shown. The 32MB of ESRAM is not large enough to use for render targets in deferred rendering engines at 1080p. You can hit 1080p with forward rendering but that is without FSAA assuming you are using 32bits/pixel.

Maths for Forward Rendering

Front Buffer: 1920x1080(resolution)x32(bits per pixel) = 66355200b = 7.91MB.

Depth Buffer: 1920x1080x32 = 7.91MB

Back Buffer: 1920x1080x32 = 7.91MB

Total = 7.91x3 = 23.73MB.

At just 2x FSAA you would need:

Front Buffer: 1920x1080x32 = 7.91MB.

Depth Buffer: 1920x1080x32x2(FSAA) = 15.82MB

Back Buffer: 1920x1080x32x2(FSAA) = 15.82MB

Total = 7.91x3 = 39.55MB.

With deferred rendering it is different because it puts the render targets into a G Buffer and then applies lighting. For BF3 a 1080p scene with 4xMSAA requires 158MB. If we divide by 4 we come out with a no AA 1080p scene requiring around 39.5MB assuming that maths is correct, or close to. This might be Frostbite 2 but I cannot imagine it is much different for Frostbite 3 or other deferred rendering engines.

So both the 16 ROPS and the ESRAM are a bottleneck against the Xbox One running games at 1080p. It does not mean it is impossible but it does mean that for the X1 to achieve it a lot of compromises have to be made and that is exactly the case with Forza 5. Of course even if we do manage to get to 1080p we then hit another bottleneck in the Xbox One and that is the reduced shader performance. With 1080p you are throwing more pixels around the screen and that requires more GPU horse power, if you want to include additional effects on top of that you have fewer resources with which to do so further compromising on the graphics.

Ron you are correct in that the PS4 cannot fully utilise the 32 ROPS it has because the bandwidth is too low but 16 ROPS would have been a large bottleneck. A balanced solution would have been to use 24 ROPS but it is likely that the choices were 16 or 32 so they went with the overkill solution of 32, which is the smart move.

Now lets talk about the hot chips diagram.

I am looking at it now and the only mention of bandwidth to/from the CPU is from the main memory pool and that is 30 GB/s. It does not mention what the bandwidth is from GPU -> CPU and it does not mention what the bandwidth is from the ESRAM to the CPU. Because of the lack of data we cannot conclude if the Xbox One has higher coherent memory bandwidth than the PS4 as we only have the speed of one of the 3 coherent memory buses.

The other thing to note in your post about this is that you claim the 10 GB/s bus between the CPU and the GPU in the PS4 is the limiting factor, it is not as the GPU and CPU cache are tiny compared to the amount of bandwidth. That bus will not be saturated because the amount of data flowing on that bus is going to be tiny. The 20 GB/s bus between the CPU and the main memory is more likely to be an issue but compute is really not bandwidth heavy so I do not see the cache coherent bandwidth being an issue for either console. The issues for compute will be purely down to available resources, of which the Xbox One has fewer.

ROP bound factors are also dependant on the color depth i.e. a lower color depth enables more ROPS units to be used.

For AMD GPUs, another case for increasing the ROP units is for MSAA, but increasing ROP/MSAA units would also need matching memory bandwidth increase.

As for 16 vs 32 ROPS bound arguments, refer to http://www.guru3d.com/articles_pages/amd_radeon_r7_265_review,11.html

Radeon HD R7-265 = 1.894 TFLOPS with 179 GB/s (5600Mhz GDDR5 256bit) VRAM and 32 ROPS.

Radeon HD R7-260X = 1.971 TFLOPS with 104 GB/s (6500Mhz GDDR5 128bit) VRAM and 16 ROPS.

With roughly similar FLOPS, the difference between Radeon HD R7-265 and Radeon HD R7-260X is 10 fps i.e. Radeon HD R7-265 has a faster memory/faster memory read and writes.

R7-265's 32 ROPS doesn't equal 2X over Radeon HD R7-260X i.e. in another words it's insignificant advantage without proper matching memory bandwidth e.g. R9-280X's 32 ROPs still enable it reach 89 fps.

----------------

On X1's eSRAM issue. http://gamingbolt.com/xbox-ones-esram-too-small-to-output-games-at-1080p-but-will-catch-up-to-ps4-rebellion-games

Bolcato stated that, “It was clearly a bit more complicated to extract the maximum power from the Xbox One when you’re trying to do that. I think eSRAM is easy to use. The only problem is…Part of the problem is that it’s just a little bit too small to output 1080p within that size. It’s such a small size within there that we can’t do everything in 1080p with that little buffer of super-fast RAM.

“It means you have to do it in chunks or using tricks, tiling it and so on

Your using 32 MB eSRAM like a traditional fast/large VRAM not with tiling and other tricks. Using AMD PRT/Tiled Resource+eSRAM , enables GPU's TMUs to use ESRAM instead of slower DDR3 memory. Render targets needs to be titled.

The games you mentioned are not tile friendly. The maximum potential for X1's GCN solution would be similar to prototype 7850 with 12 CU with 153.6 GB/s and that's with a very narrow or specific optimisation path. Going outside of Bolcato's optimisation path will yield the current results.

----------------

http://www.neogaf.com/forum/showpost.php?p=80951633&postcount=195

Speaking of GPGPU - we have 3X the coherent bandwidth for GPGPU at 30gb/sec which significantly improves our ability for the CPU to efficiently read data generated by the GPU.

My point with PS4 vs X1's memory coherent bandwidth is X1 has superior memory coherent bandwidth over PS4 and it's not about it's usefulness. The results speaks for themselves i.e. superior memory coherent bandwidth is a yawn feature at this time.

So what you have shown is that by improving just the back end by 68% for the ROPS and memory bandwidth by 72% we have increased the average frame rate by 25%. Looking at this review it seems that on average the 265 is 29.5% faster than the 260x at 1080p and that is down to the memory bandwidth and the ROPS on their own as the rest of the card is actually slightly worse than the 260x. (The 265 has less Flops, less texture fillrate, less triangle setup rate)

Tiled resources is a solution but it also has its own performance draw backs as it is more computationally heavy, it will take a while for devs to figure out what the best use of the ESRAM is and ultimately I think that they will use it for render targets and stick with 900p. They might go for 1080p if the game is using a forward rendering engine.

This is the same Albert Panello who categorically stated that the Xbox One was not giving up a 30%+ performance deficit to the PS4 and we all know how accurate that statement was. This is also the same Albert Panello who said that he expected his detractors to apologise to him when the consoles were released and most games only showed a single digit difference in framerate.

Your point with the comparison is flawed as we only know the bandwidth of the CPU <-> Memory bus which is 30GB/s this is indeed higher than on the PS4 which has a 20GB/s bus for CPU <-> Memory but the rest of the system on the Xbox One is unknown so to say that the Xbox One has higher coherent memory bandwidth is incorrect unless you have other sources that show this to be true.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#366  Edited By ronvalencia
Member since 2008 • 29612 Posts
@btk2k2 said:

@ronvalencia said:

@btk2k2 said:

@ronvalencia said:

Microsoft and myself has the done the math and 16 ROPs are sufficient for 150 GB/s level memory bandwidth i.e. the bottlenecks are somewhere else. You are ignoring the fact that 7950 BE's 32 ROPs at 850 Mhz base clockspeed is superior to 7850's 32 ROPs at 860 Mhz and it's mostly due to 7950 BE's superior memory bandwidth.

7770 doesn't have the option to improve with 32 MB ESRAM memory bandwidth booster and dual rasterizer units.

Rebellion's POV mirrors my POV. The prototype 7850 with 12 CUs and boosted 153.6 GB/s memory bandwidth shows what 1.3TFLOPs GCN can do. Remember, 153.6 GB/s limits 32 ROPs.

The main difference between the prototype 7850 vs X1 is it's handling of non-tiling 3d engines with the prototype 7850 handles legacy 3d engines better than X1.

Part of the X1's unlocking package is to enable X1's second rasterzier unit and better middleware support for eSRAM. Your 7790 is not equipped with 153.6 GB/s VRAM.

If you notice, 7790's dual rasterzier units feeds into the CU blocks i.e. 7790's has less bottleneck at this point than 7770. The rasterzier unit is one the important hardware blocks that makes a DSP like** solution into a GPU.

**Also, AMD departed from the DSP's in-order processing model with an out-of-order (for wavefront/MIMD instruction) processing model. AMD's GCN Wavefront is just MIMD instruction issue, which is an evolution over the SIMD instruction issue.

In CU terms, R9-290's 40CU vs 7970's 32 CU is minor but AMD boosted rasterizer (2 vs 4) and 64 ROPs (with memory bandwidth increase) i.e. dual rasterizer units would have bottlenecked the 40 CUs.

To increase throughput, you have to minimise the bottlenecks at the front-end and back-ends (memory writes) i.e. this is what the prototype 7850 shows.

In real world scenarios the 7950, 7970 and 7970 Ghz Edition are ROP bound as can be shown by the vantage pixel fill benchmark. Since the 7950 runs at 850Mhz with 240GB/s of bandwidth it is trivial to deduce that 16 ROPS at 850 Mhz with 120GB/s of bandwidth will also be ROP bound. It is also the reason why AMD doubled the ROPS to 64 on the 290 series video cards despite having a very modest 11% increase in bandwidth (7970 Ghz vs 290(x))

The lack of ROPS in the X1 is going to hurt it when trying to achieve 1080p, as has been shown. The 32MB of ESRAM is not large enough to use for render targets in deferred rendering engines at 1080p. You can hit 1080p with forward rendering but that is without FSAA assuming you are using 32bits/pixel.

Maths for Forward Rendering

Front Buffer: 1920x1080(resolution)x32(bits per pixel) = 66355200b = 7.91MB.

Depth Buffer: 1920x1080x32 = 7.91MB

Back Buffer: 1920x1080x32 = 7.91MB

Total = 7.91x3 = 23.73MB.

At just 2x FSAA you would need:

Front Buffer: 1920x1080x32 = 7.91MB.

Depth Buffer: 1920x1080x32x2(FSAA) = 15.82MB

Back Buffer: 1920x1080x32x2(FSAA) = 15.82MB

Total = 7.91x3 = 39.55MB.

With deferred rendering it is different because it puts the render targets into a G Buffer and then applies lighting. For BF3 a 1080p scene with 4xMSAA requires 158MB. If we divide by 4 we come out with a no AA 1080p scene requiring around 39.5MB assuming that maths is correct, or close to. This might be Frostbite 2 but I cannot imagine it is much different for Frostbite 3 or other deferred rendering engines.

So both the 16 ROPS and the ESRAM are a bottleneck against the Xbox One running games at 1080p. It does not mean it is impossible but it does mean that for the X1 to achieve it a lot of compromises have to be made and that is exactly the case with Forza 5. Of course even if we do manage to get to 1080p we then hit another bottleneck in the Xbox One and that is the reduced shader performance. With 1080p you are throwing more pixels around the screen and that requires more GPU horse power, if you want to include additional effects on top of that you have fewer resources with which to do so further compromising on the graphics.

Ron you are correct in that the PS4 cannot fully utilise the 32 ROPS it has because the bandwidth is too low but 16 ROPS would have been a large bottleneck. A balanced solution would have been to use 24 ROPS but it is likely that the choices were 16 or 32 so they went with the overkill solution of 32, which is the smart move.

Now lets talk about the hot chips diagram.

I am looking at it now and the only mention of bandwidth to/from the CPU is from the main memory pool and that is 30 GB/s. It does not mention what the bandwidth is from GPU -> CPU and it does not mention what the bandwidth is from the ESRAM to the CPU. Because of the lack of data we cannot conclude if the Xbox One has higher coherent memory bandwidth than the PS4 as we only have the speed of one of the 3 coherent memory buses.

The other thing to note in your post about this is that you claim the 10 GB/s bus between the CPU and the GPU in the PS4 is the limiting factor, it is not as the GPU and CPU cache are tiny compared to the amount of bandwidth. That bus will not be saturated because the amount of data flowing on that bus is going to be tiny. The 20 GB/s bus between the CPU and the main memory is more likely to be an issue but compute is really not bandwidth heavy so I do not see the cache coherent bandwidth being an issue for either console. The issues for compute will be purely down to available resources, of which the Xbox One has fewer.

ROP bound factors are also dependant on the color depth i.e. a lower color depth enables more ROPS units to be used.

For AMD GPUs, another case for increasing the ROP units is for MSAA, but increasing ROP/MSAA units would also need matching memory bandwidth increase.

As for 16 vs 32 ROPS bound arguments, refer to http://www.guru3d.com/articles_pages/amd_radeon_r7_265_review,11.html

Radeon HD R7-265 = 1.894 TFLOPS with 179 GB/s (5600Mhz GDDR5 256bit) VRAM and 32 ROPS.

Radeon HD R7-260X = 1.971 TFLOPS with 104 GB/s (6500Mhz GDDR5 128bit) VRAM and 16 ROPS.

With roughly similar FLOPS, the difference between Radeon HD R7-265 and Radeon HD R7-260X is 10 fps i.e. Radeon HD R7-265 has a faster memory/faster memory read and writes.

R7-265's 32 ROPS doesn't equal 2X over Radeon HD R7-260X i.e. in another words it's insignificant advantage without proper matching memory bandwidth e.g. R9-280X's 32 ROPs still enable it reach 89 fps.

----------------

On X1's eSRAM issue. http://gamingbolt.com/xbox-ones-esram-too-small-to-output-games-at-1080p-but-will-catch-up-to-ps4-rebellion-games

Bolcato stated that, “It was clearly a bit more complicated to extract the maximum power from the Xbox One when you’re trying to do that. I think eSRAM is easy to use. The only problem is…Part of the problem is that it’s just a little bit too small to output 1080p within that size. It’s such a small size within there that we can’t do everything in 1080p with that little buffer of super-fast RAM.

“It means you have to do it in chunks or using tricks, tiling it and so on

Your using 32 MB eSRAM like a traditional fast/large VRAM not with tiling and other tricks. Using AMD PRT/Tiled Resource+eSRAM , enables GPU's TMUs to use ESRAM instead of slower DDR3 memory. Render targets needs to be titled.

The games you mentioned are not tile friendly. The maximum potential for X1's GCN solution would be similar to prototype 7850 with 12 CU with 153.6 GB/s and that's with a very narrow or specific optimisation path. Going outside of Bolcato's optimisation path will yield the current results.

----------------

http://www.neogaf.com/forum/showpost.php?p=80951633&postcount=195

Speaking of GPGPU - we have 3X the coherent bandwidth for GPGPU at 30gb/sec which significantly improves our ability for the CPU to efficiently read data generated by the GPU.

My point with PS4 vs X1's memory coherent bandwidth is X1 has superior memory coherent bandwidth over PS4 and it's not about it's usefulness. The results speaks for themselves i.e. superior memory coherent bandwidth is a yawn feature at this time.

So what you have shown is that by improving just the back end by 68% for the ROPS and memory bandwidth by 72% we have increased the average frame rate by 25%. Looking at this review it seems that on average the 265 is 29.5% faster than the 260x at 1080p and that is down to the memory bandwidth and the ROPS on their own as the rest of the card is actually slightly worse than the 260x. (The 265 has less Flops, less texture fillrate, less triangle setup rate)

Tiled resources is a solution but it also has its own performance draw backs as it is more computationally heavy, it will take a while for devs to figure out what the best use of the ESRAM is and ultimately I think that they will use it for render targets and stick with 900p. They might go for 1080p if the game is using a forward rendering engine.

This is the same Albert Panello who categorically stated that the Xbox One was not giving up a 30%+ performance deficit to the PS4 and we all know how accurate that statement was. This is also the same Albert Panello who said that he expected his detractors to apologise to him when the consoles were released and most games only showed a single digit difference in framerate.

Your point with the comparison is flawed as we only know the bandwidth of the CPU <-> Memory bus which is 30GB/s this is indeed higher than on the PS4 which has a 20GB/s bus for CPU <-> Memory but the rest of the system on the Xbox One is unknown so to say that the Xbox One has higher coherent memory bandwidth is incorrect unless you have other sources that show this to be true.

Do you claim Albert Panello's math on ROP bandwidth is wrong? The math is pretty simple.

Legitreviews runs at a lower resolution than Guru3d's and it's not the best way to stress the GPU.

1. Radeon HD 7950's 32 ROP at 800Mhz operates lower (i.e. less potential color fill rates) than 7850's 32 ROP at 860Mhz and 7950-800Mhz still delivers superior frame rate results.

2. R9-265's TMUs (for texture fill rate) has more memory bandwidth (179 GB/s), hence less memory contention. TMUs doesn't operate in isolation.

3. 7950 at 800Mhz has lower triangle rate than 7850 at 860Mhz and 7950-800 has better results. Triangle rate is not a big issue.

4. R9-265's has larger CU cache/LDS SRAM pools i.e. 16 CU x (16 L1 + 64 KB LDS) vs 14 CU x (16 L1 + 64 KB LDS). This equates to less trips to the external memory.

For Xbox 360 console ports, my old 7950-900 was playing games at 5760x1080p i.e. which is 3X of 1920x1080p.

16 ROPS is enough for 1920x1080p.

-----------------------

On X1/GCN, texture tiling is not compute heavy since it's hardware accelerated via AMD PRT.

PS; AMD just re-enabled DX11.2/tiled resource for it's GCN with driver Catalyst 14.1 beta.

-----------------------

http://www.vgleaks.com/wp-content/uploads/2013/03/durango_memory.jpg

The 30 GB/s coherent links between Northbridge and GPU MMU would limit the entire coherent system to 30 GB/s since it's lowest factor in the system i.e.

1. coherent hardware would tell the GPU that it's cache pages are invalid at 30 GB/s. If it's invalid, update the memory page. Invalid pages avoids flushing the entire GPU caches.

2. coherent hardware would tell the CPU that it's cache pages are invalid at 30 GB/s. If it's invalid, update the memory page.

3. coherent hardware's purpose is to make all data views to be consistent for all processor nodes.

This is why MS can claim 3X over PS4's CPU-to-GPU coherent links.

For reference, AMD's FCL setup on AMD Kaveri APU.

Avatar image for btk2k2
btk2k2

440

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#367 btk2k2
Member since 2003 • 440 Posts

@ronvalencia said:

Do you claim Albert Panello's math on ROP bandwidth is wrong? The math is pretty simple.

Legitreviews runs at a lower resolution than Guru3d's and it's not the best way to stress the GPU.

1. Radeon HD 7950's 32 ROP at 800Mhz operates lower (i.e. less potential color fill rates) than 7850's 32 ROP at 860Mhz and 7950-800Mhz still delivers superior frame rate results.

2. R9-265's TMUs (for texture fill rate) has more memory bandwidth (179 GB/s), hence less memory contention. TMUs doesn't operate in isolation.

3. 7950 at 800Mhz has lower triangle rate than 7850 at 860Mhz and 7950-800 has better results. Triangle rate is not a big issue.

4. R9-265's has larger CU cache/LDS SRAM pools i.e. 16 CU x (16 L1 + 64 KB LDS) vs 14 CU x (16 L1 + 64 KB LDS). This equates to less trips to the external memory.

For Xbox 360 console ports, my old 7950-900 was playing games at 5760x1080p i.e. which is 3X of 1920x1080p.

16 ROPS is enough for 1920x1080p.

-----------------------

On X1/GCN, texture tiling is not compute heavy since it's hardware accelerated via AMD PRT.

PS; AMD just re-enabled DX11.2/tiled resource for it's GCN with driver Catalyst 14.1 beta.

-----------------------

http://www.vgleaks.com/wp-content/uploads/2013/03/durango_memory.jpg

The 30 GB/s coherent links between Northbridge and GPU MMU would limit the entire coherent system to 30 GB/s since it's lowest factor in the system i.e.

1. coherent hardware would tell the GPU that it's cache pages are invalid at 30 GB/s. If it's invalid, update the memory page. Invalid pages avoids flushing the entire GPU caches.

2. coherent hardware would tell the CPU that it's cache pages are invalid at 30 GB/s. If it's invalid, update the memory page.

3. coherent hardware's purpose is to make all data views to be consistent for all processor nodes.

This is why MS can claim 3X over PS4's CPU-to-GPU coherent links.

For reference, AMD's FCL setup on AMD Kaveri APU.

I responded to your post before it was edited so I did not respond to your added information. The thing is though you are using Albert Panello's theory despite the fact we have pixel fill rate benchmarks that contradict it the math might be sound in a perfect scenario but benchmarks and games are not perfect scenarios. 16 ROPS running at 850Mhz are ROP limited with 120GB/s of memory bandwidth. This can be shown by benchmarks that are designed to extract the maximum pixel fillrate performance that they can.

Legit Reviews runs at 1080p, which is the primary resolution the consoles are targetting (or atleast the PS4 is). That makes using benchmarks at 1080p perfectly valid as it is comparing different GCN GPU configurations at the resolution that is being targeted.

1) That comparison is nothing like the one I provided.

7950R7 265R7 260 X
Clockspeed8009251100
TFlops2.871.891.97
Bandwidth GB/s240.0179.2104.0
Rops323216
Vantage Pixel Fill12.18.95.1
Vantage Texel Fill81.8354.155.5

As you can see the only actual advantage the 265 has over the 260x is the increase in ROPS and the increase in memory bandwidth. those 2 factors alone are enough for an average 29.5% increase in performance at 1080p over the 260x despite being behind in all the other performance metrics. Pixel and Texel Fill numbers are from the Anandtech Bench utility, I used the 7950 results from 2012 as the newer ones are for the 7950 boost edition.

The 7950 is in another league despite the lower clock speed because it has higher performance in every metric except triangle setup rate which just goes to show how little triangles/s matter in the overall performance picture.

I will edit this post later when I find that post where I predicted Vantage Pixel Fill numbers for the PS4 to see how accurate I was.

2) Well considering the table above it suggests that Texture performance is not very bandwidth intensive and units + clockspeed seem to matter more.

3) Considering that statement why is using the 7770 as a proxy for the X1 GPU such an issue for you? It is a great way to show what the X1 can do if the ESRAM is not being utilised at all as the shader, texture, rop, bandwidth is all very similar. The only outlier is the ESRAM effect but it gives you a good minimum performance relative to the R7 265 which is almost exactly the same as the PS4 GPU in terms of the performance numbers.

4) You really think this is going to have an impact on performance that is anything other than margin of error? This is just as insignificant as triangle setup rate, if not more so because in heavily tessellated scenes triangle rate does matter.

I never said that 16 ROPS was not enough for 1080p, I said it was a bottleneck at 1080p and would require compromises to be made which is evidentially true based on 1) the lack of 1080p games on the xbox 1 and 2) the lack of certain features or a big drop in frame rate on those games that have hit 1080p.

-----

I never said compute heavy, I said it has a computational hit, which is true as it needs to calculate what needs to be culled from the main texture and copy that over to the GPU. That is a performance hit, the question is if the trade off gives you a net performance gain, and if it does is it a larger net performance gain that just putting the render targets into ESRAM? That will take experimentation to figure out and it might very well depend on the game engine that is being used.

-----

That is an old diagram, was it even officially endorsed by MS like the HotChips one was? If not then considering it shows blatant differences like the lack of a direct CPU - ESRAM connection then I think the new information in the HotChips diagram supercedes this one and as such my comment regarding the other connections in the coherent memory architecture still holds.

Avatar image for Mcspanky37
Mcspanky37

1693

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#368 Mcspanky37
Member since 2010 • 1693 Posts

With room to spare? Couldn't that be considered a bad thing?

Avatar image for tormentos
tormentos

33784

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#369 tormentos
Member since 2003 • 33784 Posts

@ronvalencia said:

Do you claim Albert Panello's math on ROP bandwidth is wrong? The math is pretty simple.

Legitreviews runs at a lower resolution than Guru3d's and it's not the best way to stress the GPU.

1. Radeon HD 7950's 32 ROP at 800Mhz operates lower (i.e. less potential color fill rates) than 7850's 32 ROP at 860Mhz and 7950-800Mhz still delivers superior frame rate results.

2. R9-265's TMUs (for texture fill rate) has more memory bandwidth (179 GB/s), hence less memory contention. TMUs doesn't operate in isolation.

3. 7950 at 800Mhz has lower triangle rate than 7850 at 860Mhz and 7950-800 has better results. Triangle rate is not a big issue.

4. R9-265's has larger CU cache/LDS SRAM pools i.e. 16 CU x (16 L1 + 64 KB LDS) vs 14 CU x (16 L1 + 64 KB LDS). This equates to less trips to the external memory.

For Xbox 360 console ports, my old 7950-900 was playing games at 5760x1080p i.e. which is 3X of 1920x1080p.

16 ROPS is enough for 1920x1080p.

-----------------------

On X1/GCN, texture tiling is not compute heavy since it's hardware accelerated via AMD PRT.

PS; AMD just re-enabled DX11.2/tiled resource for it's GCN with driver Catalyst 14.1 beta.

-----------------------

http://www.vgleaks.com/wp-content/uploads/2013/03/durango_memory.jpg

The 30 GB/s coherent links between Northbridge and GPU MMU would limit the entire coherent system to 30 GB/s since it's lowest factor in the system i.e.

1. coherent hardware would tell the GPU that it's cache pages are invalid at 30 GB/s. If it's invalid, update the memory page. Invalid pages avoids flushing the entire GPU caches.

2. coherent hardware would tell the CPU that it's cache pages are invalid at 30 GB/s. If it's invalid, update the memory page.

3. coherent hardware's purpose is to make all data views to be consistent for all processor nodes.

This is why MS can claim 3X over PS4's CPU-to-GPU coherent links.

For reference, AMD's FCL setup on AMD Kaveri APU.

1-The CPU bandwidth is 30GB/s,it comes from the DDR3 memory bank,it is share by coherent cache by the CPU,GPU and move engines,that is the only 30GB/s there is not a second one only 1,and your diagrams was clear,is a joke that you try to use old diagrams from VGleaks which you your self have disprove in other arguments,now they are valid..

If the GPU doesn't use ESRAM is only has 38GB/s that can use for graphics,since the other 30GB's is tied by the CPU line,this is a FACT.

The black line isn't 30GB/s and it doesn't even have an identification,which is funny because it was even call into question for not been on the original leaks.

Also are you for real.? Panello is a MS employee who refuse to admit a graphical difference between PS4 and xbox one games,he even downplayed Tomb Raider difference so yeah he will lie,just like MS lie with teh xbox 360 256Gb/s bandwidth crap.

I find that funny because even the 7770 can achieve 1080p in most game under certain setting,the xbox one has a barrage of 720p games,even a fighting game is 720p,shooters 720p,MGS which is a 3rd person shooter again 720p,face it non of your theories have hold dude non.

And the gap keen been there and getting bigger instead of smaller odd isn't?

Avatar image for btk2k2
btk2k2

440

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#370  Edited By btk2k2
Member since 2003 • 440 Posts

@btk2k2 said:

I responded to your post before it was edited so I did not respond to your added information. The thing is though you are using Albert Panello's theory despite the fact we have pixel fill rate benchmarks that contradict it the math might be sound in a perfect scenario but benchmarks and games are not perfect scenarios. 16 ROPS running at 850Mhz are ROP limited with 120GB/s of memory bandwidth. This can be shown by benchmarks that are designed to extract the maximum pixel fillrate performance that they can.

Legit Reviews runs at 1080p, which is the primary resolution the consoles are targetting (or atleast the PS4 is). That makes using benchmarks at 1080p perfectly valid as it is comparing different GCN GPU configurations at the resolution that is being targeted.

1) That comparison is nothing like the one I provided.

7950R7 265R7 260 X
Clockspeed8009251100
TFlops2.871.891.97
Bandwidth GB/s240.0179.2104.0
Rops323216
Vantage Pixel Fill12.18.95.1
Vantage Texel Fill81.8354.155.5

As you can see the only actual advantage the 265 has over the 260x is the increase in ROPS and the increase in memory bandwidth. those 2 factors alone are enough for an average 29.5% increase in performance at 1080p over the 260x despite being behind in all the other performance metrics. Pixel and Texel Fill numbers are from the Anandtech Bench utility, I used the 7950 results from 2012 as the newer ones are for the 7950 boost edition.

The 7950 is in another league despite the lower clock speed because it has higher performance in every metric except triangle setup rate which just goes to show how little triangles/s matter in the overall performance picture.

I will edit this post later when I find that post where I predicted Vantage Pixel Fill numbers for the PS4 to see how accurate I was.

2) Well considering the table above it suggests that Texture performance is not very bandwidth intensive and units + clockspeed seem to matter more.

3) Considering that statement why is using the 7770 as a proxy for the X1 GPU such an issue for you? It is a great way to show what the X1 can do if the ESRAM is not being utilised at all as the shader, texture, rop, bandwidth is all very similar. The only outlier is the ESRAM effect but it gives you a good minimum performance relative to the R7 265 which is almost exactly the same as the PS4 GPU in terms of the performance numbers.

4) You really think this is going to have an impact on performance that is anything other than margin of error? This is just as insignificant as triangle setup rate, if not more so because in heavily tessellated scenes triangle rate does matter.

I never said that 16 ROPS was not enough for 1080p, I said it was a bottleneck at 1080p and would require compromises to be made which is evidentially true based on 1) the lack of 1080p games on the xbox 1 and 2) the lack of certain features or a big drop in frame rate on those games that have hit 1080p.

-----

I never said compute heavy, I said it has a computational hit, which is true as it needs to calculate what needs to be culled from the main texture and copy that over to the GPU. That is a performance hit, the question is if the trade off gives you a net performance gain, and if it does is it a larger net performance gain that just putting the render targets into ESRAM? That will take experimentation to figure out and it might very well depend on the game engine that is being used.

-----

That is an old diagram, was it even officially endorsed by MS like the HotChips one was? If not then considering it shows blatant differences like the lack of a direct CPU - ESRAM connection then I think the new information in the HotChips diagram supercedes this one and as such my comment regarding the other connections in the coherent memory architecture still holds.

I said I would edit the above post when I found my old post regarding PS4 pixel fillrate relating to the bandwidth. I did find it but I decided to reply to myself rather than edit the post to make sure it does not get missed.

I said in post 231 of this thread that the PS4 maximum pixel fill rate would be around 9.05 Gpixels/s. Now we have a 32 ROP card with 179GB/s of memory bandwidth so we can compare this number to a real benchmark. The R7 265 scored 8.9 Gpixels/s, as did the R9 270. The R9 270X, despite having the same bandwidth as the 265 and the 270 managed to score 9.0 Gpixels/s, pixel fillrate source. That makes me think that the 9.05 Gpixels/s figure I calculated 6 months ago is within the margin of error and is very close to its actual maximum figure. Further you will see that the 7850 score there is 7.8 Gpixels/s, now when I did the calculations I had a score of 7.9 Gpixels/s, if I used the same formula but use the more recent 7.8 Gpixels/s figure then I get 176/153.2 * 7.8 = 8.9 Gpixels/s. That is bang on which shows my pixel fillrate scaling formula when bandwidth limited is accurate. I was able to use it to make a prediction 6 months ago and that prediction has turned out to be correct.

What can we deduce from this new information then?

Well with the 265 vs the 260X we can see that by just increasing ROP performance by 75% and memory bandwidth by 72% we can get an average 29.5% increase in framerate at 1080p. I also calculated the maximum and minimum pixel fillrate performance of the Xbox 1. Back then I had a score of 3.7 Gpixels/s for the 7770 which was bandwidth limited, here you can see a more upto date score for the 7770 of 3.8 Gpixels/s so to get the worst case we can do 68/72 * 3.8 = 3.6 Gtexels/s as the minimum for the Xbox 1 if the ESRAM is not used by the ROPS at all. That gives the PS4 an advantage of 147% over the xbox 1. The best case scenario for the xbox 1 is that it is ROP bound so using a figure of 120GB/s as the maximum bandwidth required (based on the 7950 being ROP bound at 240GB/s) we get a score of 120/68 * 3.6 = 6.4 Gpixels/s. That still gives the PS4 a 40% fillrate advantage over the xbox 1.

That gives us a pixel fillrate advantage of between 147% and 40%, lets assume that the ESRAM is useful 80% of the time that would give the PS4 an average advantage of 61.4% for pixel fillrate and a similar score for bandwidth. That is not far off the 265 and 260X difference so the PS4 will have a 20-25% performance advantage at 1080p due to the ROPS and memory bandwidth alone. Now when you add in the shader and texturing advantage of the PS4 as well it makes it easy to understand why the PS4 can do 1080p and the XBox simply cannot without making sacrifices.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#371  Edited By ronvalencia
Member since 2008 • 29612 Posts

@btk2k2 said:

@btk2k2 said:

I responded to your post before it was edited so I did not respond to your added information. The thing is though you are using Albert Panello's theory despite the fact we have pixel fill rate benchmarks that contradict it the math might be sound in a perfect scenario but benchmarks and games are not perfect scenarios. 16 ROPS running at 850Mhz are ROP limited with 120GB/s of memory bandwidth. This can be shown by benchmarks that are designed to extract the maximum pixel fillrate performance that they can.

Legit Reviews runs at 1080p, which is the primary resolution the consoles are targetting (or atleast the PS4 is). That makes using benchmarks at 1080p perfectly valid as it is comparing different GCN GPU configurations at the resolution that is being targeted.

1) That comparison is nothing like the one I provided.

7950R7 265R7 260 X
Clockspeed8009251100
TFlops2.871.891.97
Bandwidth GB/s240.0179.2104.0
Rops323216
Vantage Pixel Fill12.18.95.1
Vantage Texel Fill81.8354.155.5

As you can see the only actual advantage the 265 has over the 260x is the increase in ROPS and the increase in memory bandwidth. those 2 factors alone are enough for an average 29.5% increase in performance at 1080p over the 260x despite being behind in all the other performance metrics. Pixel and Texel Fill numbers are from the Anandtech Bench utility, I used the 7950 results from 2012 as the newer ones are for the 7950 boost edition.

The 7950 is in another league despite the lower clock speed because it has higher performance in every metric except triangle setup rate which just goes to show how little triangles/s matter in the overall performance picture.

I will edit this post later when I find that post where I predicted Vantage Pixel Fill numbers for the PS4 to see how accurate I was.

2) Well considering the table above it suggests that Texture performance is not very bandwidth intensive and units + clockspeed seem to matter more.

3) Considering that statement why is using the 7770 as a proxy for the X1 GPU such an issue for you? It is a great way to show what the X1 can do if the ESRAM is not being utilised at all as the shader, texture, rop, bandwidth is all very similar. The only outlier is the ESRAM effect but it gives you a good minimum performance relative to the R7 265 which is almost exactly the same as the PS4 GPU in terms of the performance numbers.

4) You really think this is going to have an impact on performance that is anything other than margin of error? This is just as insignificant as triangle setup rate, if not more so because in heavily tessellated scenes triangle rate does matter.

I never said that 16 ROPS was not enough for 1080p, I said it was a bottleneck at 1080p and would require compromises to be made which is evidentially true based on 1) the lack of 1080p games on the xbox 1 and 2) the lack of certain features or a big drop in frame rate on those games that have hit 1080p.

-----

I never said compute heavy, I said it has a computational hit, which is true as it needs to calculate what needs to be culled from the main texture and copy that over to the GPU. That is a performance hit, the question is if the trade off gives you a net performance gain, and if it does is it a larger net performance gain that just putting the render targets into ESRAM? That will take experimentation to figure out and it might very well depend on the game engine that is being used.

-----

That is an old diagram, was it even officially endorsed by MS like the HotChips one was? If not then considering it shows blatant differences like the lack of a direct CPU - ESRAM connection then I think the new information in the HotChips diagram supercedes this one and as such my comment regarding the other connections in the coherent memory architecture still holds.

I said I would edit the above post when I found my old post regarding PS4 pixel fillrate relating to the bandwidth. I did find it but I decided to reply to myself rather than edit the post to make sure it does not get missed.

I said in post 231 of this thread that the PS4 maximum pixel fill rate would be around 9.05 Gpixels/s. Now we have a 32 ROP card with 179GB/s of memory bandwidth so we can compare this number to a real benchmark. The R7 265 scored 8.9 Gpixels/s, as did the R9 270. The R9 270X, despite having the same bandwidth as the 265 and the 270 managed to score 9.0 Gpixels/s, pixel fillrate source. That makes me think that the 9.05 Gpixels/s figure I calculated 6 months ago is within the margin of error and is very close to its actual maximum figure. Further you will see that the 7850 score there is 7.8 Gpixels/s, now when I did the calculations I had a score of 7.9 Gpixels/s, if I used the same formula but use the more recent 7.8 Gpixels/s figure then I get 176/153.2 * 7.8 = 8.9 Gpixels/s. That is bang on which shows my pixel fillrate scaling formula when bandwidth limited is accurate. I was able to use it to make a prediction 6 months ago and that prediction has turned out to be correct.

What can we deduce from this new information then?

Well with the 265 vs the 260X we can see that by just increasing ROP performance by 75% and memory bandwidth by 72% we can get an average 29.5% increase in framerate at 1080p. I also calculated the maximum and minimum pixel fillrate performance of the Xbox 1. Back then I had a score of 3.7 Gpixels/s for the 7770 which was bandwidth limited, here you can see a more upto date score for the 7770 of 3.8 Gpixels/s so to get the worst case we can do 68/72 * 3.8 = 3.6 Gtexels/s as the minimum for the Xbox 1 if the ESRAM is not used by the ROPS at all. That gives the PS4 an advantage of 147% over the xbox 1. The best case scenario for the xbox 1 is that it is ROP bound so using a figure of 120GB/s as the maximum bandwidth required (based on the 7950 being ROP bound at 240GB/s) we get a score of 120/68 * 3.6 = 6.4 Gpixels/s. That still gives the PS4 a 40% fillrate advantage over the xbox 1.

That gives us a pixel fillrate advantage of between 147% and 40%, lets assume that the ESRAM is useful 80% of the time that would give the PS4 an average advantage of 61.4% for pixel fillrate and a similar score for bandwidth. That is not far off the 265 and 260X difference so the PS4 will have a 20-25% performance advantage at 1080p due to the ROPS and memory bandwidth alone. Now when you add in the shader and texturing advantage of the PS4 as well it makes it easy to understand why the PS4 can do 1080p and the XBox simply cannot without making sacrifices.

Vantage's color fill rate test involves "test draws frames by filling the screen multiple times. The color and alpha of each corner of the screen is animated with the interpolated color written directly to the target using alpha blending". This is not quite a straight write to memory.

3DMark Vantage Pixel Fille rate (from anandtech and techreport)

Radeon HD 7850 (32 ROPS at 860 Mhz) = 8 Gpixel = 153.6 GB/s

Radeon HD 7870 (32 ROPS at 1Ghz) = 8 Gpixel = 153.6 GB/s (7.9 Gpixel from techreport)

Radeon HD 7950-800 (32 ROPS at 800 Mhz) = 12.1 Gpixel = 240 GB/s

Radeon HD 7970-925 (32 ROPS at 925 Mhz) = 13.2 Gpixel = 260 GB/s (from techreport)

----

The increase between 240 GB/s vs 153 GB/s is 1.56X

The increase between 12.1 Gpixel vs 8 Gpixel is 1.51X

----

The increase between 260 GB/s vs 153.6 Gpixel is 1.69X

The increase between 13.2 Gpixel vs 8 GB/s is 1.65X

----

There's a near linear relationship with increased memory bandwidth and pixel file rate.

Lets assume 7950-800 is the top 32 ROPs capability. For 16 ROPS, if we divide 7950-800's 240 GB/s by 2 it will yield ~120 GB/s write.

Lets assume 7970-925 is the top 32 ROPs capability. For 16 ROPS, if we divide 7970-925's 260 GB/s by 2 it will yield ~130 GB/s write.

Both 120 GB/s and 130GB/s values exceeds R7-260X's 104 GB/s memory bandwidth.

Microsoft's ROPs math includes both read and write which will saturate X1's ESRAM bandwidth i.e. "eight bytes write, four bytes read". Your not comparing apple with oranges.

Isolated synthetic benchmarks doesn't reflect real workloads with memory bus contentions from memory bandwidth consumer units.

Forza 5's shows X1's 16 ROP units are sufficient enough for 1920x1080p/~60fps.

Avatar image for xdluffy
xdluffy

25

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#372  Edited By xdluffy
Member since 2013 • 25 Posts

@GrenadeLauncher: And I suppose thats how a deprived cow acts after too much jerking off to twitch

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#373  Edited By ronvalencia
Member since 2008 • 29612 Posts
@tormentos said:

@ronvalencia said:

Do you claim Albert Panello's math on ROP bandwidth is wrong? The math is pretty simple.

Legitreviews runs at a lower resolution than Guru3d's and it's not the best way to stress the GPU.

1. Radeon HD 7950's 32 ROP at 800Mhz operates lower (i.e. less potential color fill rates) than 7850's 32 ROP at 860Mhz and 7950-800Mhz still delivers superior frame rate results.

2. R9-265's TMUs (for texture fill rate) has more memory bandwidth (179 GB/s), hence less memory contention. TMUs doesn't operate in isolation.

3. 7950 at 800Mhz has lower triangle rate than 7850 at 860Mhz and 7950-800 has better results. Triangle rate is not a big issue.

4. R9-265's has larger CU cache/LDS SRAM pools i.e. 16 CU x (16 L1 + 64 KB LDS) vs 14 CU x (16 L1 + 64 KB LDS). This equates to less trips to the external memory.

For Xbox 360 console ports, my old 7950-900 was playing games at 5760x1080p i.e. which is 3X of 1920x1080p.

16 ROPS is enough for 1920x1080p.

-----------------------

On X1/GCN, texture tiling is not compute heavy since it's hardware accelerated via AMD PRT.

PS; AMD just re-enabled DX11.2/tiled resource for it's GCN with driver Catalyst 14.1 beta.

-----------------------

http://www.vgleaks.com/wp-content/uploads/2013/03/durango_memory.jpg

The 30 GB/s coherent links between Northbridge and GPU MMU would limit the entire coherent system to 30 GB/s since it's lowest factor in the system i.e.

1. coherent hardware would tell the GPU that it's cache pages are invalid at 30 GB/s. If it's invalid, update the memory page. Invalid pages avoids flushing the entire GPU caches.

2. coherent hardware would tell the CPU that it's cache pages are invalid at 30 GB/s. If it's invalid, update the memory page.

3. coherent hardware's purpose is to make all data views to be consistent for all processor nodes.

This is why MS can claim 3X over PS4's CPU-to-GPU coherent links.

For reference, AMD's FCL setup on AMD Kaveri APU.

1-The CPU bandwidth is 30GB/s,it comes from the DDR3 memory bank,it is share by coherent cache by the CPU,GPU and move engines,that is the only 30GB/s there is not a second one only 1,and your diagrams was clear,is a joke that you try to use old diagrams from VGleaks which you your self have disprove in other arguments,now they are valid..

If the GPU doesn't use ESRAM is only has 38GB/s that can use for graphics,since the other 30GB's is tied by the CPU line,this is a FACT.

The black line isn't 30GB/s and it doesn't even have an identification,which is funny because it was even call into question for not been on the original leaks.

Also are you for real.? Panello is a MS employee who refuse to admit a graphical difference between PS4 and xbox one games,he even downplayed Tomb Raider difference so yeah he will lie,just like MS lie with teh xbox 360 256Gb/s bandwidth crap.

I find that funny because even the 7770 can achieve 1080p in most game under certain setting,the xbox one has a barrage of 720p games,even a fighting game is 720p,shooters 720p,MGS which is a 3rd person shooter again 720p,face it non of your theories have hold dude non.

And the gap keen been there and getting bigger instead of smaller odd isn't?

On the black line issue, hotchip.org's diagram should supersede the original leaks. LOL

You misapplied the coherent memory hardware's purpose i.e. the 30 GB/s bandwidth is for keeping the same data views for all processor nodes e.g. CPU writes to cache page or memory page to address $1000, all other cache page that holds address $1000 will be invalid. Coherent memory hardware will use it's 30 GB/s bandwidth to keep all data views the same. AMD Fusion Control Link (FCL) is use to tell the GPU's (and vice versa) $1000 memory page is invalid and needs to be updated, and this is 30 GB/s for the X1.

Does these "barrage of 720p games" follow Rebellion's optimization path?

My theories are inline Rebellion's POV and unlike you, I can resolve both Rebellion's like statements and 7770 like results. I don't discard any MS employee's statements, but seek for it's real context. Xbox 360's 256 GB/s has it's context and usage factors.

7770 can do 1920x1080p render (i.e. I have 8870M at 32 watts + 2GB GDDR5 at 72 GB/s), but don't over commit it's capabilities i.e. I usually lower the shadows and shaders details for 1080p or reduce the render resolution while keeping high details. My 8870M doesn't have the option for dual rasterizer units, dual tessellation units and 32MB ESRAM booster i.e. Rebellion's optimization path would not be available for it.

Avatar image for tormentos
tormentos

33784

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#374 tormentos
Member since 2003 • 33784 Posts

@ronvalencia said:

On the black line issue, hotchip.org's diagram should supersede the original leaks. LOL

You misapplied the coherent memory hardware's purpose i.e. the 30 GB/s bandwidth is for keeping the same data views for all processor nodes e.g. CPU writes to cache page or memory page to address $1000, all other cache page that holds address $1000 will be invalid. Coherent memory hardware will use it's 30 GB/s bandwidth to keep all data views the same. AMD Fusion Control Link (FCL) is use to tell the GPU's (and vice versa) $1000 memory page is invalid and needs to be updated, and this is 30 GB/s for the X1.

Does these "barrage of 720p games" follow Rebellion's optimization path?

My theories are inline Rebellion's POV and unlike you, I can resolve both like Rebellion's statements and 7770 like results.

7770 can do 1920x1080p render (i.e. I have 8870M at 32 watts + 2GB GDDR5 at 72 GB/s), but don't over commit it's capabilities i.e. I usually lower the shadows and shaders details for 1080p or reduce the render resolution while keeping high details. My 8870M doesn't have the option for dual rasterizer units, dual tessellation units and 32MB ESRAM booster i.e. Rebellion's optimization path would not be available for it.

No i din't that 30GB/s is for system and CPU and also connect to the GPU,regardless of the GPU been able use that line,what can be use is little because it is share,that memory bank doesn't have 68GB/s in all directions it is shared.

Rebellions was been polite and they claim the ""XBOX ONE WILL CATCH UP WITH THE PS4"".

So tell me Ronvalencia do you BELIEVE that a console with a 12 CU 1.28 TF will catch up a stronger 18 CU 1.84TF console.?

Now don't run away and evade my question,because you your self stated all days here that the prototype 7850 with 12 CU will not surpass the a 18CU GPU,but he say the xbox one will catch up.

You need to learn to read between the lines,you quoted many developers saying the difference wasn't much,but when launch time came and the difference all those developers looked bad,hell even Kujima claim the difference wasn't much yet show a 1080 version of its game running at 60 FPS and the xbox one version at 720p..

Tomb raider is the biggest gap,and if developers like Crytek could barely hit 900p at 20 something frames don't expect the gap to change.

We will see what the excuse will be 1 year from now when the PS4 version continue to dominate and the xbox one fail to catch up,Tomb Raider is 1080p it catch up with the PS4 resolution wise,but it has to give up something,.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#375  Edited By ronvalencia
Member since 2008 • 29612 Posts

@tormentos said:

@ronvalencia said:

On the black line issue, hotchip.org's diagram should supersede the original leaks. LOL

You misapplied the coherent memory hardware's purpose i.e. the 30 GB/s bandwidth is for keeping the same data views for all processor nodes e.g. CPU writes to cache page or memory page to address $1000, all other cache page that holds address $1000 will be invalid. Coherent memory hardware will use it's 30 GB/s bandwidth to keep all data views the same. AMD Fusion Control Link (FCL) is use to tell the GPU's (and vice versa) $1000 memory page is invalid and needs to be updated, and this is 30 GB/s for the X1.

Does these "barrage of 720p games" follow Rebellion's optimization path?

My theories are inline Rebellion's POV and unlike you, I can resolve both like Rebellion's statements and 7770 like results.

7770 can do 1920x1080p render (i.e. I have 8870M at 32 watts + 2GB GDDR5 at 72 GB/s), but don't over commit it's capabilities i.e. I usually lower the shadows and shaders details for 1080p or reduce the render resolution while keeping high details. My 8870M doesn't have the option for dual rasterizer units, dual tessellation units and 32MB ESRAM booster i.e. Rebellion's optimization path would not be available for it.

No i din't that 30GB/s is for system and CPU and also connect to the GPU,regardless of the GPU been able use that line,what can be use is little because it is share,that memory bank doesn't have 68GB/s in all directions it is shared.

Rebellions was been polite and they claim the ""XBOX ONE WILL CATCH UP WITH THE PS4"".

So tell me Ronvalencia do you BELIEVE that a console with a 12 CU 1.28 TF will catch up a stronger 18 CU 1.84TF console.?

Now don't run away and evade my question,because you your self stated all days here that the prototype 7850 with 12 CU will not surpass the a 18CU GPU,but he say the xbox one will catch up.

You need to learn to read between the lines,you quoted many developers saying the difference wasn't much,but when launch time came and the difference all those developers looked bad,hell even Kujima claim the difference wasn't much yet show a 1080 version of its game running at 60 FPS and the xbox one version at 720p..

Tomb raider is the biggest gap,and if developers like Crytek could barely hit 900p at 20 something frames don't expect the gap to change.

We will see what the excuse will be 1 year from now when the PS4 version continue to dominate and the xbox one fail to catch up,Tomb Raider is 1080p it catch up with the PS4 resolution wise,but it has to give up something,.

No, 30 GB/s is for memory coherency workloads as stated in hotchip's diagram.

Again, prototype 7850 with 12 CUs already shows slightly inferior performance to the retail 7850 with 16 CU. There's a higher difference between prototype 7850 against R7-265 (closest to PS4's GCN).

Rebellion is a known AMD Gaming Evolved PC developer team and may see X1 vs PS4 differences as insignificant.

Does Kujima's new game employ Rebellion's optimization path? Unlike Kujima's statements, Rebellion's statements are pretty specific on the optimization path i.e. it's not for non-tiling 3d engines. There would be no excuse with Rebellion's statements since they have nailed down the specific optimization path.

Avatar image for btk2k2
btk2k2

440

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#376  Edited By btk2k2
Member since 2003 • 440 Posts

@ronvalencia said:

Vantage's color fill rate test involves "test draws frames by filling the screen multiple times. The color and alpha of each corner of the screen is animated with the interpolated color written directly to the target using alpha blending". This is not quite a straight write to memory.

3DMark Vantage Pixel Fille rate (from anandtech and techreport)

Radeon HD 7850 (32 ROPS at 860 Mhz) = 8 Gpixel = 153.6 GB/s

Radeon HD 7870 (32 ROPS at 1Ghz) = 8 Gpixel = 153.6 GB/s (7.9 Gpixel from techreport)

Radeon HD 7950-800 (32 ROPS at 800 Mhz) = 12.1 Gpixel = 240 GB/s

Radeon HD 7970-925 (32 ROPS at 925 Mhz) = 13.2 Gpixel = 260 GB/s (from techreport)

----

The increase between 240 GB/s vs 153 GB/s is 1.56X

The increase between 12.1 Gpixel vs 8 Gpixel is 1.51X

----

The increase between 260 GB/s vs 153.6 Gpixel is 1.69X

The increase between 13.2 Gpixel vs 8 GB/s is 1.65X

----

There's a near linear relationship with increased memory bandwidth and pixel file rate.

Lets assume 7950-800 is the top 32 ROPs capability. For 16 ROPS, if we divide 7950-800's 240 GB/s by 2 it will yield ~120 GB/s write.

Lets assume 7970-925 is the top 32 ROPs capability. For 16 ROPS, if we divide 7970-925's 260 GB/s by 2 it will yield ~130 GB/s write.

Both 120 GB/s and 130GB/s values exceeds R7-260X's 104 GB/s memory bandwidth.

Microsoft's ROPs math includes both read and write which will saturate X1's ESRAM bandwidth i.e. "eight bytes write, four bytes read". Your not comparing apple with oranges.

Isolated synthetic benchmarks doesn't reflect real workloads with memory bus contentions from memory bandwidth consumer units.

Forza 5's shows X1's 16 ROP units are sufficient enough for 1920x1080p/~60fps.

A benchmark is closer to reality than the theoretical peak numbers and a game is closer to reality than a benchmark. We now have both.

Yes, the ESRAM bandwidth exceeds the bandwidth in the 260X but I said the 260X was bandwidth limited, hence why it only has 16 ROPS. Once you get above 120GB/s with 16 ROPS at 800Mhz you start becoming ROP limited.

The most recent benchmarks are showing the 7850 has a 7.8 Gpixels/s, it is probable that the 7.8 - 8.0 discrepancy is caused by run to run variance which is to be expected as a reasonable margin of error. Given that my predictions from 6+ months ago are well within that margin of error.

Now I am going to ask you a few questions.

1) What do you think is the most important performance differentiator between the R7 265 and the R7 260x?

2) You brought up Forza 5, do you consider the compromises they have made to hit 1080p/60 are due to its nature as a launch title or do you think it is due to the Xbox One hardware configuration?

3) What do you think the primary reason for the Xbox One running a large number of multiplats at 720p or 900p is?

4) You said "Triangle rate is not a big issue". If that is the case then do you think using the 7770 -> R7 265 difference in performance ,where tessellation is not a limiting factor, as a rough guide to the worst case scenario for the Xbox One is valid? If you do not then why?

I am really interested to your responses to the above questions.

Avatar image for tormentos
tormentos

33784

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#377 tormentos
Member since 2003 • 33784 Posts

@btk2k2 said:

A benchmark is closer to reality than the theoretical peak numbers and a game is closer to reality than a benchmark. We now have both.

Yes, the ESRAM bandwidth exceeds the bandwidth in the 260X but I said the 260X was bandwidth limited, hence why it only has 16 ROPS. Once you get above 120GB/s with 16 ROPS at 800Mhz you start becoming ROP limited.

The most recent benchmarks are showing the 7850 has a 7.8 Gpixels/s, it is probable that the 7.8 - 8.0 discrepancy is caused by run to run variance which is to be expected as a reasonable margin of error. Given that my predictions from 6+ months ago are well within that margin of error.

Now I am going to ask you a few questions.

1) What do you think is the most important performance differentiator between the R7 265 and the R7 260x?

2) You brought up Forza 5, do you consider the compromises they have made to hit 1080p/60 are due to its nature as a launch title or do you think it is due to the Xbox One hardware configuration?

3) What do you think the primary reason for the Xbox One running a large number of multiplats at 720p or 900p is?

4) You said "Triangle rate is not a big issue". If that is the case then do you think using the 7770 -> R7 265 difference in performance ,where tessellation is not a limiting factor, as a rough guide to the worst case scenario for the Xbox One is valid? If you do not then why?

I am really interested to your responses to the above questions.

Bringing Forza 5 is like admitting he loss,Racing games are not as demanding as FPS or other games.

http://www.anandtech.com/bench/product/536

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#378  Edited By ronvalencia
Member since 2008 • 29612 Posts

@btk2k2 said:

@ronvalencia said:

Vantage's color fill rate test involves "test draws frames by filling the screen multiple times. The color and alpha of each corner of the screen is animated with the interpolated color written directly to the target using alpha blending". This is not quite a straight write to memory.

3DMark Vantage Pixel Fille rate (from anandtech and techreport)

Radeon HD 7850 (32 ROPS at 860 Mhz) = 8 Gpixel = 153.6 GB/s

Radeon HD 7870 (32 ROPS at 1Ghz) = 8 Gpixel = 153.6 GB/s (7.9 Gpixel from techreport)

Radeon HD 7950-800 (32 ROPS at 800 Mhz) = 12.1 Gpixel = 240 GB/s

Radeon HD 7970-925 (32 ROPS at 925 Mhz) = 13.2 Gpixel = 260 GB/s (from techreport)

----

The increase between 240 GB/s vs 153 GB/s is 1.56X

The increase between 12.1 Gpixel vs 8 Gpixel is 1.51X

----

The increase between 260 GB/s vs 153.6 Gpixel is 1.69X

The increase between 13.2 Gpixel vs 8 GB/s is 1.65X

----

There's a near linear relationship with increased memory bandwidth and pixel file rate.

Lets assume 7950-800 is the top 32 ROPs capability. For 16 ROPS, if we divide 7950-800's 240 GB/s by 2 it will yield ~120 GB/s write.

Lets assume 7970-925 is the top 32 ROPs capability. For 16 ROPS, if we divide 7970-925's 260 GB/s by 2 it will yield ~130 GB/s write.

Both 120 GB/s and 130GB/s values exceeds R7-260X's 104 GB/s memory bandwidth.

Microsoft's ROPs math includes both read and write which will saturate X1's ESRAM bandwidth i.e. "eight bytes write, four bytes read". Your not comparing apple with oranges.

Isolated synthetic benchmarks doesn't reflect real workloads with memory bus contentions from memory bandwidth consumer units.

Forza 5's shows X1's 16 ROP units are sufficient enough for 1920x1080p/~60fps.

A benchmark is closer to reality than the theoretical peak numbers and a game is closer to reality than a benchmark. We now have both.

Yes, the ESRAM bandwidth exceeds the bandwidth in the 260X but I said the 260X was bandwidth limited, hence why it only has 16 ROPS. Once you get above 120GB/s with 16 ROPS at 800Mhz you start becoming ROP limited.

The most recent benchmarks are showing the 7850 has a 7.8 Gpixels/s, it is probable that the 7.8 - 8.0 discrepancy is caused by run to run variance which is to be expected as a reasonable margin of error. Given that my predictions from 6+ months ago are well within that margin of error.

Now I am going to ask you a few questions.

1) What do you think is the most important performance differentiator between the R7 265 and the R7 260x?

2) You brought up Forza 5, do you consider the compromises they have made to hit 1080p/60 are due to its nature as a launch title or do you think it is due to the Xbox One hardware configuration?

3) What do you think the primary reason for the Xbox One running a large number of multiplats at 720p or 900p is?

4) You said "Triangle rate is not a big issue". If that is the case then do you think using the 7770 -> R7 265 difference in performance ,where tessellation is not a limiting factor, as a rough guide to the worst case scenario for the Xbox One is valid? If you do not then why?

I am really interested to your responses to the above questions.

1. R7-265 is faster than R7-260X in most games. R7-260X is slightly faster than R7-265 in several GpGPU desktop apps.

Higher memory bandwidth equals less bus contention, but this factor has to match GPU's I/O.

Higher cache/higher LDS in higher CU count = less trips to external memory.

Higher CU count = higher wavefront queue slots.

Higher CU count = more out-of-order wavefront dispatchers.

Higher CU count = more registers for complex shaders.

R7-265 would be the better option.

2. With my 8870M (32 watts), I usually reduce shadows (memory bandwidth), disable MSAA (memory bandwidth), reduce complex shaders (ALU) OR reduce the render resolution. For non-tiling engines, my laptop is nearly a portable X1.

3. If these games departs from Rebellion's stated narrow optimization path, you'll get 7770 like results.

4. It's not big issue at the moment i.e. crazy tessellation on brick walls are just politics and there are only a few of them.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#379  Edited By ronvalencia
Member since 2008 • 29612 Posts

@tormentos said:

@btk2k2 said:

A benchmark is closer to reality than the theoretical peak numbers and a game is closer to reality than a benchmark. We now have both.

Yes, the ESRAM bandwidth exceeds the bandwidth in the 260X but I said the 260X was bandwidth limited, hence why it only has 16 ROPS. Once you get above 120GB/s with 16 ROPS at 800Mhz you start becoming ROP limited.

The most recent benchmarks are showing the 7850 has a 7.8 Gpixels/s, it is probable that the 7.8 - 8.0 discrepancy is caused by run to run variance which is to be expected as a reasonable margin of error. Given that my predictions from 6+ months ago are well within that margin of error.

Now I am going to ask you a few questions.

1) What do you think is the most important performance differentiator between the R7 265 and the R7 260x?

2) You brought up Forza 5, do you consider the compromises they have made to hit 1080p/60 are due to its nature as a launch title or do you think it is due to the Xbox One hardware configuration?

3) What do you think the primary reason for the Xbox One running a large number of multiplats at 720p or 900p is?

4) You said "Triangle rate is not a big issue". If that is the case then do you think using the 7770 -> R7 265 difference in performance ,where tessellation is not a limiting factor, as a rough guide to the worst case scenario for the Xbox One is valid? If you do not then why?

I am really interested to your responses to the above questions.

Bringing Forza 5 is like admitting he loss,Racing games are not as demanding as FPS or other games.

http://www.anandtech.com/bench/product/536

You are still converting user's POV data into color.

Avatar image for I_can_haz
I_can_haz

6511

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#380  Edited By I_can_haz
Member since 2013 • 6511 Posts

Lems can't catch a break. This is what happens when you buy a $500 console with a gimped GPU that runs most of it games at 720p with frame drops.

Avatar image for megaspiderweb09
megaspiderweb09

3686

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#381 megaspiderweb09
Member since 2009 • 3686 Posts

Its would be funny if 5yrs down the line, we find out that the Xbox One actually is on par with the PS4, i wonder what would happen then

Avatar image for btk2k2
btk2k2

440

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#382  Edited By btk2k2
Member since 2003 • 440 Posts

@ronvalencia said:

1. R7-265 is faster than R7-260X in most games. R7-260X is slightly faster than R7-265 in several GpGPU desktop apps.

Higher memory bandwidth equals less bus contention, but this factor has to match GPU's I/O.

Higher cache/higher LDS in higher CU count = less trips to external memory.

Higher CU count = higher wavefront queue slots.

Higher CU count = more out-of-order wavefront dispatchers.

Higher CU count = more registers for complex shaders.

R7-265 would be the better option.

2. With my 8870M (32 watts), I usually reduce shadows (memory bandwidth), disable MSAA (memory bandwidth), reduce complex shaders (ALU) OR reduce the render resolution. For non-tiling engines, my laptop is nearly a portable X1.

3. If these games departs from Rebellion's stated narrow optimization path, you'll get 7770 like results.

4. It's not big issue at the moment i.e. crazy tessellation on brick walls are just politics and there are only a few of them.

1) I agree that the main differentiator is memory bandwidth and it would be interesting to see how the 265 performed with the same bandwidth as the 260X.

The higher cache and CU count arguments do not have any meaningful performance bearing at all though as I will now show you.

The Windforce 290 OC has 2560 shaders which can run at 1040Mhz all the time thanks to the cooling setup. That gives it 5.3TFlops of shader performance while also having the same memory bandwidth as a 290X.

The 290x quiet mode is prone to throttling and based on this data we can see it averages a clock speed of 947Mhz. With 2816 shaders that gives it an average 5.3TFlops of shader performance and it has the same memory bandwidth as the Windforce 290 OC.

If we then look at this review of the Windforce 290 OC we can compare the average frame rates between it and the 290x quiet mode. I compared 10 games and the average cumulative frame rate for the Windforce 290 was 58.4 FPS, for the 290X quiet mode it was 58.8 FPS. That is a margin of error difference and shows that as long as the shader performance is the same having more CUs does not provide a performance advantage over a higher clocked but reduced CU card when all else is equal.

While the differences you mentioned are true they do not have a bearing on the overall performance of a graphics card as the above shows.

2) This is kind of what they did with Forza 5 so correct me if I am wrong but you are saying in a round about way that the compromises are due to the hardware but you feel that with tiling based engines they might be able to work around it because of the ESRAM. Is that correct?

3) Again correct me if I am wrong but you are saying that the sub 1080p games on the Xbox 1 are due to a lack of optimisation by the devs? Possibly due to the API not having great support for the ESRAM meaning the devs have to do more work to optimise for it?

4) I agree with this entirely and I take the lack of a counter argument, along with the response to my 3rd question, means that you feel the performance delta from the 7770 to the R7-265 is roughly comparable to the performance delta between the Xbox One and the PS4 when the developers have not optimised the use of ESRAM in the Xbox 1? You also feel that if the devs were to optimise for the ESRAM (or had API tools that made this easier) then that performance delta would close somewhat, but not entirely as you do keep stating the the 7850 prototype shows that a 12 CU part cannot beat a 16 CU part when everything else is equal.

I think it seems that we both generally agree but we like to argue semantics and have different view on what the important performance differentiators are, all we can do is wait and see I suppose.

My prediction is that in the next 12 - 18 months we should see a slight closing of the gap between the Xbox One and the PS4 as the devs get used to handling the ESRAM and the API tools around using it improve. I also predict that beyond that once the devs stop making cross generation games and start pushing GPGPU the gap will widen again and this will be most evident in the immersion and reactivity of the worlds in PS4 exclusives.

Avatar image for KungfuKitten
KungfuKitten

27389

Forum Posts

0

Wiki Points

0

Followers

Reviews: 42

User Lists: 0

#383 KungfuKitten
Member since 2006 • 27389 Posts

Does he mean the main menu or games?