Everyone can relax Maxwell supports Async Compute

04dcarraher

Follow

23832

Forum Posts

0

Wiki Points

0

Followers

Reviews: 2

User Lists: 0

#1 Edited By 04dcarraher

Member since 2004 • 23832 Posts

Here is a quote from Oxide talking with Nvidia

In Maxwell

"The Asynchronous Warp Schedulers are in the hardware. Each SMM (which is a shader engine in GCN terms) holds four AWSs. Unlike GCN, the scheduling aspect is handled in software for Maxwell 2. In the driver there’s a Grid Management Queue which holds pending tasks and assigns the pending tasks to another piece of software which is the work distributor. The work distributor then assigns the tasks to available Asynchronous Warp Schedulers. It’s quite a few different “parts” working together. A software and a hardware component if you will.

With GCN the developer sends work to a particular queue (Graphic/Compute/Copy) and the driver just sends it to the Asynchronous Compute Engine (for Async compute) or Graphic Command Processor (Graphic tasks but can also handle compute), DMA Engines (Copy). The queues, for pending Async work, are held within the ACEs (8 deep each)… and ACEs handle assigning Async tasks to available compute units."

Simplified…

Maxwell : Queues in Software, work distributor in software, Asynchronous Warps in hardware, DMA Engines in hardware, CUDA cores in hardware.

GCN: Queues/Work distributor/Asynchronous Compute engines (ACEs/Graphic Command Processor) in hardware, Copy (DMA Engines) in hardware, CUs in hardware."

8 years ago

jj-josh

Follow

266

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 5

#2 Edited By jj-josh

Member since 2014 • 266 Posts

8 years ago

topgunmv

Follow

10880

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#3 topgunmv

Member since 2003 • 10880 Posts

Sounds like a software workaround (less efficient) for a hardware shortcoming.

8 years ago

BassMan

Follow

17835

Forum Posts

0

Wiki Points

0

Followers

Reviews: 226

User Lists: 0

#4 Edited By BassMan Online

Member since 2002 • 17835 Posts

It certainly sounds less than ideal. I wonder if they will go the full hardware route for Pascal?

8 years ago

04dcarraher

Follow

23832

Forum Posts

0

Wiki Points

0

Followers

Reviews: 2

User Lists: 0

#5 Edited By 04dcarraher

Member since 2004 • 23832 Posts

Its still hardware based, cpu just sets it up the queue for the gpu to process. AMD's route just has the gpu's ACE units set up the queue to what needs to be processed. In the end the difference will be unnoticeable. Also to note from the fiery storm that came from the Ashes benchmarks also raised a brow about AMD's Async latency, so in the end Nvidia's cpu/software based queuing is not really an issue since AMD has insane latency using Async. So in the end both sides are on par.

8 years ago

BassMan

Follow

17835

Forum Posts

0

Wiki Points

0

Followers

Reviews: 226

User Lists: 0

#6 Edited By BassMan Online

Member since 2002 • 17835 Posts

I have never been too worried about it. Nvidia and AMD both worked closely with Microsoft when developing the DX12 API. Each side knows what is needed. Nvidia has smart engineers and I doubt they would allow AMD to get the upper hand in DX12 performance.

8 years ago

glez13

Follow

10310

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#7 glez13

Member since 2006 • 10310 Posts

Even AMD already admitted that there is no card on the market with full DX12 support right now. Don't know why fanboys still going on with this.

8 years ago

horgen

Follow

127517

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#8 horgen Moderator

Member since 2006 • 127517 Posts

@glez13 said:

Even AMD already admitted that there is no card on the market with full DX12 support right now. Don't know why fanboys still going on with this.

Because they are fanboys?

Anyhow I get the impression that async compute is far from optimal for both of them(nVidia/AMD).

8 years ago

Coseniath

Follow

3183

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#9 Coseniath

Member since 2004 • 3183 Posts

That means Nvidia needs to work with drivers while AMD doesn't need to...

NVIDIA Will Fully Implement Async Compute Via Driver Support, Oxide Confirms

From overclock.net thread:

Regarding Async compute, a couple of points on this. FIrst, though we are the first D3D12 title, I wouldn’t hold us up as the prime example of this feature. There are probably better demonstrations of it. This is a pretty complex topic and to fully understand it will require significant understanding of the particular GPU in question that only an IHV can provide. I certainly wouldn’t hold Ashes up as the premier example of this feature.
We actually just chatted with Nvidia about Async Compute, indeed the driver hasn’t fully implemented it yet, but it appeared like it was. We are working closely with them as they fully implement Async Compute. We’ll keep everyone posted as we learn more.

:P

8 years ago

elessarGObonzo

Follow

2677

Forum Posts

0

Wiki Points

0

Followers

Reviews: 140

User Lists: 0

#10 elessarGObonzo

Member since 2008 • 2677 Posts

@glez13 said:

Even AMD already admitted that there is no card on the market with full DX12 support right now. Don't know why fanboys still going on with this.

have seen plenty of these remarks, usually by a consolite, but never an actual quote from AMD or Nvidia confirming that the current GPUs out there are lacking any DX12 feature support.

8 years ago

glez13

Follow

10310

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#11 glez13

Member since 2006 • 10310 Posts

@elessarGObonzo said:

@glez13 said:

Even AMD already admitted that there is no card on the market with full DX12 support right now. Don't know why fanboys still going on with this.

have seen plenty of these remarks, usually by a consolite, but never an actual quote from AMD or Nvidia confirming that the current GPUs out there are lacking any DX12 feature support.

AMD Admits It Too Doesn't Support All DirectX 12 Features

AMD caught a lucky break earlier this week when it was discovered that NVIDIA's Maxwell GPUs don't support DirectX 12's Async Compute feature natively. Today, AMD admitted that its latest GCN GPUs are missing some DirectX 12 features as well.

Replying to a Reddit thread, AMD’s Robert Hallock admitted that there are no graphics cards with "full support" for DirectX 12 in the market today.

"I think gamers are learning an important lesson: there's no such thing as "full support" for DX12 on the market today.
There have been many attempts to distract people from this truth through campaigns that deliberately conflate feature levels, individual untiered features and the definition of "support." This has been confusing, and caused so much unnecessary heartache and rumor-mongering.
Here is the unvarnished truth: Every graphics architecture has unique features, and no one architecture has them all. Some of those unique features are more powerful than others.
Yes, we're extremely pleased that people are finally beginning to see the game of chess we've been playing with the interrelationship of GCN, Mantle, DX12, Vulkan and LiquidVR."

When asked to name the DirectX 12 features not supported by AMD hardware, Hallock listed "Raster Ordered Views and Conservative Raster. Thankfully, the techniques that these enable (like global illumination) can already be done in other ways at high framerates (see: DiRT Showdown)."

8 years ago

elessarGObonzo

Follow

2677

Forum Posts

0

Wiki Points

0

Followers

Reviews: 140

User Lists: 0

#12 elessarGObonzo

Member since 2008 • 2677 Posts

@glez13: so Nvidia has worked around it's Async "problem" and AMD can handle "Raster Ordered Views and Conservative Raster" in other ways at high frame rates. so what features are not supported?

8 years ago

04dcarraher

Follow

23832

Forum Posts

0

Wiki Points

0

Followers

Reviews: 2

User Lists: 0

#13 04dcarraher

Member since 2004 • 23832 Posts

@glez13: Dirt showdown? lol the performance hit is enormous and the graphical difference is minimal. 7970ghz back then seen more than 20 fps drop enabling it.

8 years ago

ronvalencia

Follow

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#14 Edited By ronvalencia

Member since 2008 • 29612 Posts

@04dcarraher said:

Here is a quote from Oxide talking with Nvidia

In Maxwell

"The Asynchronous Warp Schedulers are in the hardware. Each SMM (which is a shader engine in GCN terms) holds four AWSs. Unlike GCN, the scheduling aspect is handled in software for Maxwell 2. In the driver there’s a Grid Management Queue which holds pending tasks and assigns the pending tasks to another piece of software which is the work distributor. The work distributor then assigns the tasks to available Asynchronous Warp Schedulers. It’s quite a few different “parts” working together. A software and a hardware component if you will.

With GCN the developer sends work to a particular queue (Graphic/Compute/Copy) and the driver just sends it to the Asynchronous Compute Engine (for Async compute) or Graphic Command Processor (Graphic tasks but can also handle compute), DMA Engines (Copy). The queues, for pending Async work, are held within the ACEs (8 deep each)… and ACEs handle assigning Async tasks to available compute units."

Simplified…

Maxwell : Queues in Software, work distributor in software, Asynchronous Warps in hardware, DMA Engines in hardware, CUDA cores in hardware.

GCN: Queues/Work distributor/Asynchronous Compute engines (ACEs/Graphic Command Processor) in hardware, Copy (DMA Engines) in hardware, CUs in hardware."

It amounts to software emulation for the said feature, hence statement for Maxwellv2 itself doesn't support it still stands.

@Coseniath said:

That means Nvidia needs to work with drivers while AMD doesn't need to...

NVIDIA Will Fully Implement Async Compute Via Driver Support, Oxide Confirms

From overclock.net thread:

Regarding Async compute, a couple of points on this. FIrst, though we are the first D3D12 title, I wouldn’t hold us up as the prime example of this feature. There are probably better demonstrations of it. This is a pretty complex topic and to fully understand it will require significant understanding of the particular GPU in question that only an IHV can provide. I certainly wouldn’t hold Ashes up as the premier example of this feature.
We actually just chatted with Nvidia about Async Compute, indeed the driver hasn’t fully implemented it yet, but it appeared like it was. We are working closely with them as they fully implement Async Compute. We’ll keep everyone posted as we learn more.

:P

It's a software side construct.

8 years ago

ronvalencia

Follow

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#16 ronvalencia

Member since 2008 • 29612 Posts

@BassMan said:

I have never been too worried about it. Nvidia and AMD both worked closely with Microsoft when developing the DX12 API. Each side knows what is needed. Nvidia has smart engineers and I doubt they would allow AMD to get the upper hand in DX12 performance.

1. DX12 enables AMD GPUs to have MT scaling while NVIDIA already has the optional deferred MT command list building since DX11_0 hence some MT scaling. AMD has larger gain from DX12 since they didn't have any MT scaling to start with. For NVIDIA GPUs, DX11's MT scaling gets replaced by DX12 MT scaling, hence minor improvements. AMD don't bothered with porting DX11.X MT scaling driver from XBO for the PC.

In the end, both AMD and NVIDIA are even with DX12 MT scaling.

2. DX12 enable low CPU overheads hence fixing AMD's high CPU overhead issues. For DX11, NVIDIA has slightly lower CPU overheads than AMD drivers.

In the end, both AMD and NVIDIA are even with low CPU overheads.

@04dcarraher said:

Its still hardware based, cpu just sets it up the queue for the gpu to process. AMD's route just has the gpu's ACE units set up the queue to what needs to be processed. In the end the difference will be unnoticeable. Also to note from the fiery storm that came from the Ashes benchmarks also raised a brow about AMD's Async latency, so in the end Nvidia's cpu/software based queuing is not really an issue since AMD has insane latency using Async. So in the end both sides are on par.

https://forum.beyond3d.com/threads/dx12-async-compute-latency-thread.57188/page-23

The debate has progressed on Async compute latency.

For pure compute, AMD's compute latency rivals NVIDIA's counterpart.

8 years ago

ronvalencia

Follow

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#17 ronvalencia

Member since 2008 • 29612 Posts

@BassMan said:

I have never been too worried about it. Nvidia and AMD both worked closely with Microsoft when developing the DX12 API. Each side knows what is needed. Nvidia has smart engineers and I doubt they would allow AMD to get the upper hand in DX12 performance.

1. DX12 enables AMD GPUs to have MT scaling while NVIDIA already has the optional deferred MT command list building since DX11_0 hence some MT scaling. AMD has larger gain from DX12 since they didn't have any MT scaling to start with. For NVIDIA GPUs, DX11's MT scaling gets replaced by DX12 MT scaling, hence minor improvements. AMD don't bothered with porting DX11.X MT scaling driver from XBO for the PC.

In the end, both AMD and NVIDIA are even with DX12 MT scaling.

2. DX12 enable low CPU overheads hence fixing AMD's high CPU overhead issues. For DX11, NVIDIA has slightly lower CPU overheads than AMD drivers.

In the end, both AMD and NVIDIA are even with low CPU overheads.

@04dcarraher said:

Its still hardware based, cpu just sets it up the queue for the gpu to process. AMD's route just has the gpu's ACE units set up the queue to what needs to be processed. In the end the difference will be unnoticeable. Also to note from the fiery storm that came from the Ashes benchmarks also raised a brow about AMD's Async latency, so in the end Nvidia's cpu/software based queuing is not really an issue since AMD has insane latency using Async. So in the end both sides are on par.

https://forum.beyond3d.com/threads/dx12-async-compute-latency-thread.57188/page-23

The debate has progressed on Async compute latency.

For pure compute, AMD's compute latency rivals NVIDIA's counterpart.

http://www.overclock.net/t/1569897/various-ashes-of-the-singularity-dx12-benchmarks/1710#post_24368195

Post from Mahigan

Here's what I think they did at Beyond3D:

They set the amount of threads, per kernel, to 32 (they're CUDA programmers after-all).
They've bumped the Kernel count to up to 512 (16,384 Threads total).
They're scratching their heads wondering why the results don't make sense when comparing GCN to Maxwell 2

Here's why that's not how you code for GCN

Why?:

Each CU can have 40 Kernels in flight (each made up of 64 threads to form a single Wavefront).
That's 2,560 Threads total PER CU.
An R9 290x has 44 CUs or the capacity to handle 112,640 Threads total.

If you load up GCN with Kernels made up of 32 Threads you're wasting resources. If you're not pushing GCN you're wasting compute potential. In slide number 4, it stipulates that latency is hidden by executing overlapping wavefronts. This is why GCN appears to have a high degree of latency but you can execute a ton of work on GCN without affected the latency. With Maxwell/2, latency rises up like a staircase with the more work you throw at it. I'm not sure if the folks at Beyond3D are aware of this or not.

Conclusion:

I think they geared this test towards nVIDIAs CUDA architectures and are wondering why their results don't make sense on GCN. If true... DERP! That's why I said the single Latency results don't matter. This test is only good if you're checking on Async functionality.

GCN was built for Parallelism, not serial workloads like nVIDIAs architectures. This is why you don't see GCN taking a hit with 512 Kernels.

What did Oxide do? They built two paths. One with Shaders Optimized for CUDA and the other with Shaders Optimized for GCN. On top of that GCN has Async working. Therefore it is not hard to determine why GCN performs so well in Oxide's engine. It's a better architecture if you push it and code for it. If you're only using light compute work, nVIDIAs architectures will be superior.

8 years ago

Xtasy26

Follow

5582

Forum Posts

0

Wiki Points

0

Followers

Reviews: 53

User Lists: 0

#18 Xtasy26

Member since 2008 • 5582 Posts

Anything that is done in hardware will always be superior than when it's done in software. It reminds me when Eyefinity came out and it was done in hardware that's why you only needed a single GPU where as nVidia's quick fix was done in software which required two GPUs. Only in later generations that you could get nVidia surround with a single GPU. What about users who use Kepler GPUs, the GTX 600 series? It looks like they may get screwed where as if people with HD 7900 series they will not.

8 years ago

ronvalencia

Follow

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#19 ronvalencia

Member since 2008 • 29612 Posts

@Xtasy26 said:

Anything that is done in hardware will always be superior than when it's done in software. It reminds me when Eyefinity came out and it was done in hardware that's why you only needed a single GPU where as nVidia's quick fix was done in software which required two GPUs. Only in later generations that you could get nVidia surround with a single GPU. What about users who use Kepler GPUs, the GTX 600 series? It looks like they may get screwed where as if people with HD 7900 series they will not.

In terms of business model, making Kelpers obsolete(i.e. "legacy" status support) which causes NVIDIA fanboys/customers to buy another CUDA GPU is a shareholder value move.

8 years ago