Nvidia announces 120Tflops Volta GPU!

Avatar image for loco145
loco145

12226

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#1  Edited By loco145
Member since 2006 • 12226 Posts

Volta-Based Tesla V100 GPU Shatters Barrier of 120 Teraflops

NVIDIA today launched Volta™ -- the world's most powerful GPU computing architecture, created to drive the next wave of advancement in artificial intelligence and high performance computing.

Volta, NVIDIA's seventh-generation GPU architecture, is built with 21 billion transistors and delivers the equivalent performance of 100 CPUs for deep learning.

It provides a 5x improvement over Pascal™, the current-generation NVIDIA GPU architecture, in peak teraflops, and 15x over the Maxwell™ architecture, launched two years ago. This performance surpasses by 4x the improvements that Moore's law would have predicted.

Breakthrough Technologies

The Tesla V100 GPU leapfrogs previous generations of NVIDIA GPUs with groundbreaking technologies that enable it to shatter the 100 teraflops barrier of deep learning performance. They include:

  • Tensor Cores designed to speed AI workloads. Equipped with 640 Tensor Cores, V100 delivers 120 teraflops of deep learning performance, equivalent to the performance of 100 CPUs.
  • New GPU architecture with over 21 billion transistors. It pairs CUDA cores and Tensor Cores within a unified architecture, providing the performance of an AI supercomputer in a single GPU.
  • NVLink™ provides the next generation of high-speed interconnect linking GPUs, and GPUs to CPUs, with up to 2x the throughput of the prior generation NVLink.
  • 900 GB/sec HBM2 DRAM, developed in collaboration with Samsung, achieves 50 percent more memory bandwidth than previous generation GPUs, essential to support the extraordinary computing throughput of Volta.
  • Volta-optimized software, including CUDA, cuDNN and TensorRT™ software, which leading frameworks and applications can easily tap into to accelerate AI and research.

Source

The V100 will first appear inside Nvidia's bespoke compute servers. Eight of them will come packed inside the $150,000 (~£150,000) DGX-1 rack-mounted server, which ships in the third quarter of 2017. A 250W PCIe slot version of the V100 is also in the works (probably priced at around £10,000), as well as a half-height 150W card that's likely to feature a lower clock speed and disabled cores.

Source

Will the Ps5 even be 1/10 as powerful?

Avatar image for ten_pints
Ten_Pints

4072

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 5

#2 Ten_Pints
Member since 2014 • 4072 Posts

£10,000... I'll have 2.

Avatar image for Xplode_games
Xplode_games

2540

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#3 Xplode_games
Member since 2011 • 2540 Posts

Although a 1080 ti will breeze by anything you throw at it even 4k. Even if you have to update to the latest and greatest every year for the next 5 years, you're still nowhere near the price of this new Volta card.

It's beyond overpriced but may work for millionaires.

Avatar image for lundy86_4
lundy86_4

61509

Forum Posts

0

Wiki Points

0

Followers

Reviews: 2

User Lists: 0

#4  Edited By lundy86_4
Member since 2003 • 61509 Posts

So, a new GPU for high-end workstations? This is probably for things like rendering... Not for gaming.

Edit: Ahhh, and servers as well.

Avatar image for appariti0n
appariti0n

5013

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#5  Edited By appariti0n
Member since 2009 • 5013 Posts

@lundy86_4: Tesla is designed for large datacenters so that multiple autocad user, or users of other workstation class applications don't need a high end card in each workstation. They all share from one big graphics pool.

Most engineers I talk to still prefer having their own card though.

I wish they would release a consumer version though. Could have one badass gpu in one system, allowing multiple gamers to use it at once.

Avatar image for BassMan
BassMan

17834

Forum Posts

0

Wiki Points

0

Followers

Reviews: 226

User Lists: 0

#6  Edited By BassMan
Member since 2002 • 17834 Posts

@Xplode_games said:

Although a 1080 ti will breeze by anything you throw at it even 4k. Even if you have to update to the latest and greatest every year for the next 5 years, you're still nowhere near the price of this new Volta card.

It's beyond overpriced but may work for millionaires.

I have a 1080 Ti and I can tell you straight up that it does not breeze by anything you throw at it at 4K. 4K/60 Ultra is very demanding and the newer more advanced AAA titles struggle. Even when you lower settings, it is still difficult to hit a steady 4K/60.

Avatar image for primorandomguy
Primorandomguy

3368

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 5

#7 Primorandomguy
Member since 2014 • 3368 Posts

Only 150,000 dollars! What a steal!

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#8  Edited By ronvalencia
Member since 2008 • 29612 Posts

From http://www.anandtech.com/show/11367/nvidia-volta-unveiled-gv100-gpu-and-tesla-v100-accelerator-announced

Tensor Cores are a new type of core for Volta that can, at a high level, be thought of as a more rigid, less flexible (but still programmable) core geared specifically for Tensor deep learning operations. These cores are essentially a mass collection of ALUs for performing 4x4 Matrix operations; specifically a fused multiply add (A*B+C), multiplying two 4x4 FP16 matrices together, and then adding that result to an FP16 or FP32 4x4 matrix to generate a final 4x4 FP32 matrix.

The significance of these cores are that by performing a massive matrix-matrix multiplication operation in one unit, NVIDIA can achieve a much higher number of FLOPS for this one operation. A single Tensor Core performs the equivalent of 64 FMA operations per clock (for 128 FLOPS total), and with 8 such cores per SM, 1024 FLOPS per clock per SM. By comparison, even with pure FP16 operations, the standard CUDA cores in an SM only generate 256 FLOPS per clock. So in scenarios where these cores can be used, NV is slated to be able to deliver 4x the performance versus Pascal

Global Foundry's 7 nm version Vega 10 reduces the size to about Polaris 10's 232 mm^2 i.e. about 260 mm^2, hence that's about 25 TFLOPS Fp16 and 12.5 TFLOPS Fp32. In terms of FLOPS, that's about 20 percent from GV100.

Scorpio's GPU area size is about 283 mm^2, hence any 7 nm version GPU would be faster than 7 nm shrink Vega 10.

@loco145:

Your argument would be better if you actual own GV100 instead of being a cheer squad.

GV100 is +800 mm^2 size chip, hence very expensive when compared to 417 mm^2 GTX 1080 Ti/Titan XP.

For future 7 nm GPU, AMD could design 64bit ALU with double rate Fp32 (float2) and quad rate Fp16 (half4).

From https://msdn.microsoft.com/en-us/library/windows/desktop/mt733232(v=vs.85).aspx

The <type> parameter and return value for these functions implies the type of the expression, the supported types are those from the following list that are also present in the target shader model for your app:

  • half, half2, half3, half4
  • float, float2, float3, float4
  • double, double2, double3, double4
  • int, int2, int3, int4
  • uint, uint2, uint3, uint4
  • short, short2, short3, short4
  • ushort, ushort2, ushort3, ushort4
  • uint64_t, uint64_t2, uint64_t3, uint64_t4

Shader Model 6 has support for future double rate Fp32 (float2) and quad rate Fp16 (half4).

The GPU hardware road map is already already indicated by Shader Model 6's specification.

Tensor Cores is effectively a simple 256 bit SIMD FMA unit i.e. (256 bit packed x 256 bit packed + 256 bit packed).

256 bit SIMD datatype can hold 16 16bit datatypes.

Avatar image for appariti0n
appariti0n

5013

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#9  Edited By appariti0n
Member since 2009 • 5013 Posts

@BassMan: I only have a scrubby 1080, and have to set games to ultra, then manually tweak settings down until I'm happy.

About the only modern games I can run at 4K totally maxed with a solid 60 fps are Doom and MGSV, both of which feature engines that are so optimized it's ridiculous. And Doom runs at 144 fps, the refesh of my monitor. I really hope Vulkan catches on with devs.

Avatar image for loco145
loco145

12226

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#10 loco145
Member since 2006 • 12226 Posts

@ronvalencia: That burn!

Avatar image for Dibdibdobdobo
Dibdibdobdobo

6683

Forum Posts

0

Wiki Points

0

Followers

Reviews: 3

User Lists: 0

#11  Edited By Dibdibdobdobo
Member since 2008 • 6683 Posts

So it's one of these in the Scorpio?!

Avatar image for HalcyonScarlet
HalcyonScarlet

13668

Forum Posts

0

Wiki Points

0

Followers

Reviews: 2

User Lists: 0

#12  Edited By HalcyonScarlet
Member since 2011 • 13668 Posts

Somehow I don't think Nvidia are concerned with Vega, when they make the next round of GPUs. Still progressing.

As for NVLink, I thought they were doing that with Pascal.

Avatar image for KungfuKitten
KungfuKitten

27389

Forum Posts

0

Wiki Points

0

Followers

Reviews: 42

User Lists: 0

#13  Edited By KungfuKitten
Member since 2006 • 27389 Posts

@Xplode_games said:

Although a 1080 ti will breeze by anything you throw at it even 4k. Even if you have to update to the latest and greatest every year for the next 5 years, you're still nowhere near the price of this new Volta card.

It's beyond overpriced but may work for millionaires.

I'm looking into 1440p at 100+fps and a 1080ti barely cuts it today... let alone in 3 years. I wouldn't say no to faster cards (BTW it's so funny to hear the idea of Scorpio running 4k smoothly when I can't find a card to comfortably run future games. I guess that console efficiency is really going to start kicking in :b)

That price tho. Wow... It's not meant for games anyway... But the deep learning potential... Our cellphones will take over the world within 25 years...

Avatar image for o0squishy0o
o0squishy0o

2802

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#14 o0squishy0o
Member since 2007 • 2802 Posts

@HalcyonScarlet said:

Somehow I don't think Nvidia are concerned with Vega, when they make the next round of GPUs. Still progressing.

As for NVLink, I thought they were doing that with Pascal.

Exactly, they are probably trying to build up their own hype; certainly obvious with the fact this isn't related to gaming but still will generate some excitement. AMD Vega really does need to be something special, even if its just behind 1080ti but is offered at 70% of the price for example would be a huge benefit for gamers and the market.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#15 ronvalencia
Member since 2008 • 29612 Posts

@HalcyonScarlet said:

Somehow I don't think Nvidia are concerned with Vega, when they make the next round of GPUs. Still progressing.

As for NVLink, I thought they were doing that with Pascal.

AMD is distracted by game console customization request changes.

Avatar image for APiranhaAteMyVa
APiranhaAteMyVa

4160

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#16 APiranhaAteMyVa
Member since 2011 • 4160 Posts

Cool PC gamers will be able to get another 100fps out of CS:GO

Avatar image for appariti0n
appariti0n

5013

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#17 appariti0n
Member since 2009 • 5013 Posts

@ronvalencia said:

From http://www.anandtech.com/show/11367/nvidia-volta-unveiled-gv100-gpu-and-tesla-v100-accelerator-announced

Tensor Cores are a new type of core for Volta that can, at a high level, be thought of as a more rigid, less flexible (but still programmable) core geared specifically for Tensor deep learning operations. These cores are essentially a mass collection of ALUs for performing 4x4 Matrix operations; specifically a fused multiply add (A*B+C), multiplying two 4x4 FP16 matrices together, and then adding that result to an FP16 or FP32 4x4 matrix to generate a final 4x4 FP32 matrix.

The significance of these cores are that by performing a massive matrix-matrix multiplication operation in one unit, NVIDIA can achieve a much higher number of FLOPS for this one operation. A single Tensor Core performs the equivalent of 64 FMA operations per clock (for 128 FLOPS total), and with 8 such cores per SM, 1024 FLOPS per clock per SM. By comparison, even with pure FP16 operations, the standard CUDA cores in an SM only generate 256 FLOPS per clock. So in scenarios where these cores can be used, NV is slated to be able to deliver 4x the performance versus Pascal

Global Foundry's 7 nm version Vega 10 reduces the size to about Polaris 10's 232 mm^2 i.e. about 260 mm^2, hence that's about 25 TFLOPS Fp16 and 12.5 TFLOPS Fp32. In terms of FLOPS, that's about 20 percent from GV100.

Scorpio's GPU area size is about 283 mm^2, hence any 7 nm version GPU would be faster than 7 nm shrink Vega 10.

@loco145:

Your argument would be better if you actual own GV100 instead of being a cheer squad.

GV100 is +800 mm^2 size chip, hence very expensive when compared to 417 mm^2 GTX 1080 Ti/Titan XP.

For future 7 nm GPU, AMD could design 64bit ALU with double rate Fp32 (float2) and quad rate Fp16 (half4).

From https://msdn.microsoft.com/en-us/library/windows/desktop/mt733232(v=vs.85).aspx

The <type> parameter and return value for these functions implies the type of the expression, the supported types are those from the following list that are also present in the target shader model for your app:

  • half, half2, half3, half4
  • float, float2, float3, float4
  • double, double2, double3, double4
  • int, int2, int3, int4
  • uint, uint2, uint3, uint4
  • short, short2, short3, short4
  • ushort, ushort2, ushort3, ushort4
  • uint64_t, uint64_t2, uint64_t3, uint64_t4

Shader Model 6 has support for future double rate Fp32 (float2) and quad rate Fp16 (half4).

The GPU hardware road map is already already indicated by Shader Model 6's specification.

Tensor Cores is effectively a simple 256 bit SIMD FMA unit i.e. (256 bit packed x 256 bit packed + 256 bit packed).

256 bit SIMD datatype can hold 16 16bit datatypes.

Avatar image for zaryia
Zaryia

21607

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#18  Edited By Zaryia
Member since 2016 • 21607 Posts

@APiranhaAteMyVa said:

Cool PC gamers will be able to get another 100fps out of CS:GO

PC gets more multiplatform titles than any other system. Depending on the hardware, such as the one the OP mentioned, most of these games play noticably better on PC. Playing most big/new games at 30 fps (or lower) is unacceptable to these people.

Your logic is as silly as bringing up Ferrari sales in a Civic VS. Ferrari performance thread. In other words, you lose.

Avatar image for Elaisse
Elaisse

648

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#19 Elaisse
Member since 2012 • 648 Posts

Finally shovel Knight at 4k is a reality.

Avatar image for navyguy21
navyguy21

17443

Forum Posts

0

Wiki Points

0

Followers

Reviews: 10

User Lists: 0

#20  Edited By navyguy21
Member since 2003 • 17443 Posts

....but can it max out Crysis?

Avatar image for appariti0n
appariti0n

5013

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#21  Edited By appariti0n
Member since 2009 • 5013 Posts

@zaryia said:
@APiranhaAteMyVa said:

Cool PC gamers will be able to get another 100fps out of CS:GO

PC gets more multiplatform titles than any other system. Depending on the hardware, such as the one the OP mentioned, most of these games play noticably better on PC. Playing most big/new games at 30 fps (or lower) is unacceptable to these people.

Your logic is as silly as bringing up Ferrari sales in a Civic VS. Ferrari performance thread. In other words, you lose.

Not to mention 3 entire genres of games that are practically non-existent on console.

Moba, RTS (and many turn based strategy), MMOs...... but of course nobody cares about those games.

Oh, and good luck trying to play any RPGs in the black isle/bioware style. Like Torment, Tyranny, Pillars of eternity, etc.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#23 ronvalencia
Member since 2008 • 29612 Posts

@appariti0n said:
@ronvalencia said:

From http://www.anandtech.com/show/11367/nvidia-volta-unveiled-gv100-gpu-and-tesla-v100-accelerator-announced

Tensor Cores are a new type of core for Volta that can, at a high level, be thought of as a more rigid, less flexible (but still programmable) core geared specifically for Tensor deep learning operations. These cores are essentially a mass collection of ALUs for performing 4x4 Matrix operations; specifically a fused multiply add (A*B+C), multiplying two 4x4 FP16 matrices together, and then adding that result to an FP16 or FP32 4x4 matrix to generate a final 4x4 FP32 matrix.

The significance of these cores are that by performing a massive matrix-matrix multiplication operation in one unit, NVIDIA can achieve a much higher number of FLOPS for this one operation. A single Tensor Core performs the equivalent of 64 FMA operations per clock (for 128 FLOPS total), and with 8 such cores per SM, 1024 FLOPS per clock per SM. By comparison, even with pure FP16 operations, the standard CUDA cores in an SM only generate 256 FLOPS per clock. So in scenarios where these cores can be used, NV is slated to be able to deliver 4x the performance versus Pascal

Global Foundry's 7 nm version Vega 10 reduces the size to about Polaris 10's 232 mm^2 i.e. about 260 mm^2, hence that's about 25 TFLOPS Fp16 and 12.5 TFLOPS Fp32. In terms of FLOPS, that's about 20 percent from GV100.

Scorpio's GPU area size is about 283 mm^2, hence any 7 nm version GPU would be faster than 7 nm shrink Vega 10.

@loco145:

Your argument would be better if you actual own GV100 instead of being a cheer squad.

GV100 is +800 mm^2 size chip, hence very expensive when compared to 417 mm^2 GTX 1080 Ti/Titan XP.

For future 7 nm GPU, AMD could design 64bit ALU with double rate Fp32 (float2) and quad rate Fp16 (half4).

From https://msdn.microsoft.com/en-us/library/windows/desktop/mt733232(v=vs.85).aspx

The <type> parameter and return value for these functions implies the type of the expression, the supported types are those from the following list that are also present in the target shader model for your app:

  • half, half2, half3, half4
  • float, float2, float3, float4
  • double, double2, double3, double4
  • int, int2, int3, int4
  • uint, uint2, uint3, uint4
  • short, short2, short3, short4
  • ushort, ushort2, ushort3, ushort4
  • uint64_t, uint64_t2, uint64_t3, uint64_t4

Shader Model 6 has support for future double rate Fp32 (float2) and quad rate Fp16 (half4).

The GPU hardware road map is already already indicated by Shader Model 6's specification.

Tensor Cores is effectively a simple 256 bit SIMD FMA unit i.e. (256 bit packed x 256 bit packed + 256 bit packed).

256 bit SIMD datatype can hold 16 16bit datatypes.

1. Not Direct3D nor Vulkan

2. Road map.

Avatar image for com2006
com2006

900

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#24 com2006
Member since 2006 • 900 Posts

I have four on pre-order, hoping for 4K@60fps

Avatar image for deactivated-5f3ec00254b0d
deactivated-5f3ec00254b0d

6278

Forum Posts

0

Wiki Points

0

Followers

Reviews: 54

User Lists: 0

#25 deactivated-5f3ec00254b0d
Member since 2009 • 6278 Posts

Almost as powerful as the Scorpio.

Avatar image for lamprey263
lamprey263

44604

Forum Posts

0

Wiki Points

0

Followers

Reviews: 10

User Lists: 0

#26  Edited By lamprey263
Member since 2006 • 44604 Posts

and sadly the most ambitious thing any gamer could think of doing with it is playing Crysis on it

Avatar image for ButDuuude
ButDuuude

1907

Forum Posts

0

Wiki Points

0

Followers

Reviews: 2

User Lists: 0

#27  Edited By ButDuuude
Member since 2013 • 1907 Posts

No teraflops needed for this guy

Avatar image for ShepardCommandr
ShepardCommandr

4939

Forum Posts

0

Wiki Points

0

Followers

Reviews: 3

User Lists: 0

#28 ShepardCommandr
Member since 2013 • 4939 Posts

it's gonna takes years before that much power is available to the average consumer

but yeah 120TF would be enough even for 8k resolutions.I give it 10 years before we achieve maximum photo realism.

Avatar image for mrbojangles25
mrbojangles25

58382

Forum Posts

0

Wiki Points

0

Followers

Reviews: 11

User Lists: 0

#29  Edited By mrbojangles25
Member since 2005 • 58382 Posts

@ShepardCommandr said:

it's gonna takes years before that much power is available to the average consumer

but yeah 120TF would be enough even for 8k resolutions.I give it 10 years before we achieve maximum photo realism.

you know I never thought I'd see it in my lifetime, but I just think I might at this rate.

*not sure how to phrase this but is the rate of tech advancement, ummm, exponential? It sure seems like it sometimes...

seems like yesterday I was playing on my Pentium Pro with a Riva TNT 2 32 MB card