@Gargus said:
Hidden power?
Its a racing game, of course it will look pretty good. The roads, background and everything except the cars are just static images that move and rotate in relation to the car to give it a sense of movement. They really only have to focus on the cars themselves vs having to render entire landscapes that need to be drawn in for games where you can run anywhere you want while driving games you are stuck on a track.
And if while were on the topic and graphics make something good then play a racing game on ps4 or xbox one, there, that just single handedly ruined the "hidden power" of the wiius racing game.
There is no "hidden power".
Optimizing a GPU with VLIW architecture is harder than SIMD based GPUs e.g. AMD Graphics Core Next (X1/PS4) or AMD Xenos (Xbox 360).
Wii U's VLIW5 with 320 stream processors has the potential to process 320 shader instructions per cycle i.e. each 5 scalar instructions (1 data element for each scalar) are bundled together into 1 very wide instruction issue, hence "VLIW5".
On Xbox 360, it just 1 scalar (1 data element) and 1 SIMD4 (4 data elements) instruction issue. Xbox 360's GPU can process up to 96 shader instructions per cycle. Xbox 360's 48 pipelines are divided into 3 blocks with each having 16 pipelines i.e. 16 SIMD4 yields 64 stream processors and 16 for scalar.
For X1/PS4, each AMD CU block in Graphic Core Next has 64 stream processors from 16 SIMD4 + 1 independent scalar unit (big difference from Xbox 360 i.e. scalar units are not independent from SIMD units). Xbox 360 is roughly equivalent to 3 CU equipped Radeon HD at 500Mhz.
GPUs with VLIW architecture as much higher instruction per cycle potential than their SIMD counterparts.
Read http://www.anandtech.com/show/5261/amd-radeon-hd-7970-review/3 on Radeon HD 6970(Cayman)'s VLIW4 vs Radeon HD 7970 (Southern Islands)'s SIMD designs.
Because the smallest unit of work is the SIMD and a CU has 4 SIMDs, a CU works on 4 different wavefronts at once. As wavefronts are still 64 operations wide, each cycle a SIMD will complete ¼ of the operations on their respective wavefront, and after 4 cycles the current instruction for the active wavefront is completed.
Cayman by comparison would attempt to execute multiple instructions from the same wavefront in parallel, rather than executing a single instruction from multiple wavefronts. This is where Cayman got bursty – if the instructions were in any way dependent, Cayman would have to let some of its ALUs go idle. GCN on the other hand does not face this issue, because each SIMD handles single instructions from different wavefronts they are in no way attempting to take advantage of ILP, and their performance will be very consistent.
...
There are other aspects of GCN that influence its performance – the scalar unit plays a huge part – but in comparison to Cayman, this is the single biggest difference. By not taking advantage of ILP, but instead taking advantage of Thread Level Parallism (TLP) in the form of executing more wavefronts at once, GCN will be able to deliver high compute performance and to do so consistently.
---------------------------
Wii U uses older "VLIW5" type Radeon HD. It's too bad Wii U wasn't equipped with Radeon HD 8570M (AMD Graphics Core Next with 6 CUs) with 64bit DDR3-2000.
I have laptops with Radeon HD 4650M (320 stream processors/4 CU, VLIW5) + GDDR3-1400 128bit and Radeon HD 8570M (384 stream processors/6 CU, Graphics Core Next) + 64bit DDR3-2000. Radeon HD 8570M (~15 watts, Ultrabook) has better performance than Radeon HD 4650M (15 to 20 watts).
Log in to comment