NVIDIA Tamasi's DirectX 12 vs AMD's Full DirectX 12 compatibility promised for GCNs

by on

From Techreport. http://techreport.com/news/26210/directx-12-will-also-add-new-features-for-next-gen-gpus

"However, Tamasi explained that DirectX 12 will introduce a set of new features in addition to the lower-level abstraction, and those features will require new hardware."

Mr Tamasi is from NVIDIA and doesn't have the authority for non-NVIDIA hardware.

Techreport's "new blend modes" and something called "conservative rasterization" refers to DirectX12's new rendering modes.

1. "Programmable blend and efficient OIT with pixel ordered UAV".

2. "Better collision and culling with Conservative Rasterization".

If you compare point 1 with http://software.intel.com/en-us/blogs/2013/07/18/order-independent-transparency-approximation-with-pixel-synchronization

The API is based on Intel's Pixel Sync. The main feature with this Intel API is the pixel shader wait function. This avoids the "link list" requirements since the pipelines are handling the pixel shader read/manage/write order.

Again, NVIDIA's Mr Tamasi DOES NOT have any authority for non-NVIDIA hardware.

For reference

DirectX11's Link List based OIT from http://www.docstoc.com/docs/106125562/Order-Independent-Transparency-Using-DirectX-11-Linked-Lists

OpenGL 4.0+'s Link List based OIT from http://blog.icare3d.org/2010/07/opengl-40-abuffer-v20-linked-lists-of.html

--------

For Conservative Rasterization, Read http://www.google.com/patents/WO2013101167A1?cl=en

Intel's "Five-dimensional rasterization with conservative bounds" patent

"We present a method for computing conservative depth over a motion blurred triangle over a tile of pixels. This is useful for occlusion culling (zmin/zma¾-culling) and conservative rasterization, as well as a number of other techniques, such as dynamic collision detection on the graphics processor, and caustics rendering"

-----------------

From http://www.amd.com/us/press-releases/Pages/amd-demonstrates-2014mar20.aspx

"Full DirectX 12 compatibility promised for the award-winning Graphics Core Next architecture"

Cheap gaming PC build beating next-gen consoles

by on

http://www.guru3d.com/articles_pages/battlefield_4_vga_graphics_performance_benchmark,7.html

"monitor resolution of 1600x1200 - again all tests have DX11 / Ultra Quality mode with 2xMSAA enabled and HBAO"

Note the 7870 GE's 1600x1200p vs PS4's 1600x900p result.

PCPartPicker part list: http://pcpartpicker.com/p/1VRF7

CPU: AMD Athlon II X4 740 3.2GHz Quad-Core Processor ($67.00 @ Amazon)

Motherboard: Biostar A55MD2 Micro ATX FM2 Motherboard ($38.25 @ Newegg)

Memory: G.Skill Ripjaws X Series 8GB (2 x 4GB) DDR3-2133 Memory ($63.75 @ Newegg)

Storage: Toshiba 500GB 3.5" 7200RPM Internal Hard Drive ($44.99 @ Newegg)

Video Card: PowerColor Radeon HD 7870 XT 2GB Video Card ($163.98 @ Newegg)

Case: Sentey CS1-1420 PLUS ATX Mid Tower Case ($19.99 @ Newegg)

Power Supply: CoolMax 500W 80 PLUS Certified ATX12V / EPS12V

Power Supply ($19.99 @ Newegg)

Optical Drive: Lite-On iHAS124-04 DVD/CD Writer ($17.98 @ OutletPC)

Total: $435.93 (Prices include shipping, taxes, and discounts when available.) (Generated by PCPartPicker 2013-11-01 10:18 EDT-0400)

---------

PS; 7870 XT > 7870 GE

http://www.guru3d.com/articles_pages/battlefield_4_vga_graphics_performance_benchmark,11.html

  • GeForce cards use the latest 331.65 WHQL Beta driver (download here).
  • AMD Radeon graphics cards we used the latest 13.11 Beta build 8 driver (download here).

---------------------------------------------

CPU benchmarks

http://www.techspot.com/review/734-battlefield-4-benchmarks/page6.html

Crytek's view on X1 vs PS4.

by on

http://www.eurogamer.net/articles/digitalfoundry-crytek-the-next-generation

Digital Foundry: Crytek is at the bleeding edge of rendering tech and your demands from hardware are very high, so how satisfied are you with the final designs of the Xbox One and PlayStation 4?

Cevat Yerli: Both consoles have a DX 11.1+ capable GPU with full compute shader support which allows us to come up with new creative rendering techniques that were not possible before. The GPUs are very efficient in performing math operations and the CPUs, in contrast to the previous PowerPC-based architectures, have standard PC features like out-of-order execution and branch prediction. All this reduces the need for micro-optimisation and allows us to focus more on the high-level algorithms which is usually the more rewarding part of development. We are also looking forward to seeing what the PlayStation 4 will offer in regards to online compute capacities, and the strategy rolled out for Xbox One cloud support is certainly going to be very interesting in terms of compute power for next-gen games. I also think Microsoft's decision to include Kinect as standard is a positive one, as it avoids fragmenting the market and allows developers to treat its functionality as a given.

Digital Foundry: On the face of it the bigger GPU and wider bandwidth of the PlayStation 4 makes it far more powerful than the Xbox One. Yet developers like John Carmack - and even PC benchmarks on equivalent hardware - suggest that the two platforms may be closer than the specs suggest. What's your assessment?

Cevat Yerli: Both next-gen platforms have excellent specs and provide wins against each other in a variety of areas. But, in essence, both of them are going to run next-generation games in more or less the same quality due to the diminishing returns of optimising for these little differences. That being said, platform-exclusive titles might be able to take advantage of these slight variations on both Xbox One and PlayStation 4.

tormentos being stupid

by on

Tormentos' memory bandwdith stupidity

http://au.gamespot.com/forums/topic/29451500/xbox-one--7790-confirmed-by-xbox-one-architec.

Math_errors_b_zpscbece1c2.png

X1 can add memory bandwidth since it has two memory pools (think of L-shaped multi-memory controller setups) while PS4 has a single memory pool.

----------------------------------------------

Me: "From TC's link, X1 is capable to drive superior (against PS4) raytracing write results"

tormentos wrote:The same developer on xbox one = True. Double Standard FTW.

-----------

You fool, X1's superior write performance was backed by a secondary source from Gaigin

http://www.edge-online.com/news/gaijin-games-on-why-war-thunder-isnt-coming-to-xbox-one/

How much more powerful?

AY: It depends what youre doing. GPU, like 40 per cent more powerful. DDR5 is basically 50 per cent more powerful than DDR3, but the memory write [performance] is bigger on Xbox One so it depends on what youre doing.

How is that going to translate to on-screen results for the kinds of games you want to make? So to optimise War Thunder on both consoles you could hypothetically make a better, prettier version on PS4?

AY: Yep.

KY: Probably yes. But again, thats not a very big deal.

----------------------------------------------------------

http://asia.gamespot.com/forums/topic/29447777/edge.com-ps4-1080p-30fps--xb1900p-sub-20fps.-linky---o-boy.dave-?msg_id=341535592#341535592

I don't think you get it,MS combined bandwidth on xbox 360 to claim superiority over the PS3,as if bandwidth could be add like apples,in fact the EDRAM mean nothing in the end the PS4 with far lower bandwidth got the job done,and exclusive actually outshine MS ones

tormentos

They didn't show the bandwidth link between 360's EDRAM and GPU while X1's hotchips reveal shows the bandwidth link between ESRAM and the GPU.

http://en.wikipedia.org/wiki/File:X360bandwidthdiagram.jpg

http://media.teamxbox.com/dailyposts/xbox360/hardware/ati_xenos_02.jpg

https://semiaccurate.com/assets/uploads/2013/08/XBox_One_SoC_diagram.jpg

----------------------------------------------------------

http://au.gamespot.com/forums/topic/29449078/xbox-one-vs-playstation-4-game-developers-say-its-a-draw?msg_id=341538645#341538645

Dude Ronvalencia quoted 133GB/s bandwidth that was from a nameless developer on DF article

tormentos

http://www.eurogamer.net/articles/digitalfoundry-xbox-one-memory-better-in-production-hardware

Microsoft techs have found that the hardware is capable of reading and writing simultaneously. Apparently, there are spare processing cycle "holes" that can be utilised for additional operations. Theoretical peak performance is one thing, but in real-life scenarios it's believed that 133GB/s throughput has been achieved with alpha transparency blending operations (FP16 x4).

The nameless developer for the 133GB/s claim was Microsoft.

tormentos is a liar.

AMD GCN CU hardware scheduler with out-of-order wavefront processing.

by on

http://www.techpowerup.com/178129/amd-to-fix-gcn-latency-issues-with-driver-updates.html

He continued "Additionally, when we switched from the old VLIW architecture to the GCN core there was a significant updates to all parts of the driver was needed although not really spoken about the entire memory management on GCN is different to prior GPU's and the initial software management for that was primarily driven by schedule and in the meantime we've been rewriting it again and we have discovered that the new version has also improved frame latency in a number of cases so we are accelerating the QA and implementation of that."

http://www.tomshardware.com/reviews/radeon-hd-7970-benchmark-tahiti-gcn,3104-2.html

The CU has its own hardware scheduler that's able to assign wavefronts to available VUs with limited out-of-order capability to avoid dependency bottlenecks

computeunit_zpsa9e97df2.jpg

97997892.jpg

AMD GCN's CU can process multiple Kernels at once.

This is the key to better compute performance because it gives each VU the ability to work on different wavefronts if a dependency exists in the queue

Compute-unit-dependency-handling.jpg

http://www.amd.com/us/Documents/GCN_Architecture_whitepaper.pdf

In GCN, each SIMD unit is assigned its own 40-bit program counter and instruction buffer for 10 wavefronts. The whole CU can thus have 40 wavefronts in flight, each potentially from a different work-group or kernel, which is substantially more flexible than previous designs. This means that a GCN GPU with 32 CUs, such as the AMD Radeon HD 7970, can be working on up to 81,920 work items at a time.

http://www.anandtech.com/show/4455/amds-graphics-core-next-preview-amd-architects-for-compute/4

However the CU and SIMDs can select a different wavefront to work on; this can be another wavefront spawned by the same task (e.g. a different group of pixels/values) or it can be a wavefront from a different task entirely.