@04dcarraher said:
Doom was released with latest version on Opengl while AMD gpus at the time didn't support it because of their focus on Mantle. AMD suffered from poor utilization until they started using vulkan to fix that issue. Nvidia tends to maximize gpu resources from the get go.
Also your assuming that x1x is running all settings and tweaks same as pc versions which its not.... they are using a slew of more refined settings including dynamic settings, and using tricks to upscale and save resources to fit the hardware better. x1x gpu is not as fast as you are suggesting it is.For all intensive purposes its basically a slightly modified RX 480/580.
Issue 1
NVIDIA already has Async task for DX11 MT while AMD's PC DX11 drivers has nothing...
https://developer.nvidia.com/dx12-dos-and-donts
Don’ts
- Don’t rely on the driver to parallelize any Direct3D12 works in driver threads
- On DX11 the driver does farm off asynchronous tasks to driver worker threads where possible – this doesn’t happen anymore under DX12
- While the total cost of work submission in DX12 has been reduced, the amount of work measured on the application’s thread may be larger due to the loss of driver threading. The more efficiently one can use parallel hardware cores of the CPU to submit work in parallel, the more benefit in terms of draw call submission performance can be expected.
Under Vulkan or DirectX12, both NVIDIA and AMD are on even level field since it's up to game programmer's job to create multi-threads and async tasks.
This issue wasn't a large factor for AMD's strong showing for Call of Duty IW DX11, Titan Fall 2 DX11 and Resident Evil 7 DX11.
Issue 2
Memory movement handling behavior can influence the effective TFLOPS.
For Killer Instinct, it's not memory bandwidth heavy game which enables RX-580 to reach it's TFLOPS potential against NVIDIA's TFLOPS counterpart.
Another factor would be following's AMD's guidance on software based tile cache render via compute path since this is connected to 2MB L2 cache for RX-580. Both RX-580 and X1X is about equal on compute shader being connected to 2MB L2 cache path. Compute shader path uses TMU write function as ROPS replacement (without ROPS fix function effects).
There's several methods for blending effect layers i.e. compute shader path or pixel (RBE/ROPS) path.
On NVIDIA Paxwell GPUs, compute and pixel paths has similar read/write performance, hence a more consistent performance. This is due to both TMU and ROPS are connected to L2 cache.
Other reasons for AMD pushing compute shader/async compute shader path are mostly due to
1. more TMUs relative ROPS units
2. Multi-MB L2 cache being connected to TMUs.
My main reason why I select GTX 980 Ti over R9-390X is NVIDIA GPU is good for both compute and pixel paths, hence better consistency.
RX-580's rasterization rate is slightly higher than GTX 1070's 5.0 G triangles btw.
RX Vega 32 is effectively improved RX-580 with RBE/ROPS being connected to multi-MB L2 cache.
NVIDIA has been flying with RBE/ROPS tile render cache advantage since Maxwell era and has been profiting from this advantage with near RX-580 BOM cost GTX 1070.
Log in to comment