Nvidia announces 120Tflops Volta GPU!

loco145

Follow

12226

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#1 Edited By loco145

Member since 2006 • 12226 Posts

Volta-Based Tesla V100 GPU Shatters Barrier of 120 Teraflops

NVIDIA today launched Volta™ -- the world's most powerful GPU computing architecture, created to drive the next wave of advancement in artificial intelligence and high performance computing.

Volta, NVIDIA's seventh-generation GPU architecture, is built with 21 billion transistors and delivers the equivalent performance of 100 CPUs for deep learning.

It provides a 5x improvement over Pascal™, the current-generation NVIDIA GPU architecture, in peak teraflops, and 15x over the Maxwell™ architecture, launched two years ago. This performance surpasses by 4x the improvements that Moore's law would have predicted.

Breakthrough Technologies

The Tesla V100 GPU leapfrogs previous generations of NVIDIA GPUs with groundbreaking technologies that enable it to shatter the 100 teraflops barrier of deep learning performance. They include:

Tensor Cores designed to speed AI workloads. Equipped with 640 Tensor Cores, V100 delivers 120 teraflops of deep learning performance, equivalent to the performance of 100 CPUs.
New GPU architecture with over 21 billion transistors. It pairs CUDA cores and Tensor Cores within a unified architecture, providing the performance of an AI supercomputer in a single GPU.
NVLink™ provides the next generation of high-speed interconnect linking GPUs, and GPUs to CPUs, with up to 2x the throughput of the prior generation NVLink.
900 GB/sec HBM2 DRAM, developed in collaboration with Samsung, achieves 50 percent more memory bandwidth than previous generation GPUs, essential to support the extraordinary computing throughput of Volta.
Volta-optimized software, including CUDA, cuDNN and TensorRT™ software, which leading frameworks and applications can easily tap into to accelerate AI and research.

Source

The V100 will first appear inside Nvidia's bespoke compute servers. Eight of them will come packed inside the $150,000 (~£150,000) DGX-1 rack-mounted server, which ships in the third quarter of 2017. A 250W PCIe slot version of the V100 is also in the works (probably priced at around £10,000), as well as a half-height 150W card that's likely to feature a lower clock speed and disabled cores.

Source

Will the Ps5 even be 1/10 as powerful?

7 years ago

Ten_Pints

Follow

4072

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 5

#2 Ten_Pints

Member since 2014 • 4072 Posts

£10,000... I'll have 2.

7 years ago

Xplode_games

Follow

2540

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#3 Xplode_games

Member since 2011 • 2540 Posts

Although a 1080 ti will breeze by anything you throw at it even 4k. Even if you have to update to the latest and greatest every year for the next 5 years, you're still nowhere near the price of this new Volta card.

It's beyond overpriced but may work for millionaires.

7 years ago

lundy86_4

Follow

61509

Forum Posts

0

Wiki Points

0

Followers

Reviews: 2

User Lists: 0

#4 Edited By lundy86_4

Member since 2003 • 61509 Posts

So, a new GPU for high-end workstations? This is probably for things like rendering... Not for gaming.

Edit: Ahhh, and servers as well.

7 years ago

appariti0n

Follow

5013

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#5 Edited By appariti0n

Member since 2009 • 5013 Posts

@lundy86_4: Tesla is designed for large datacenters so that multiple autocad user, or users of other workstation class applications don't need a high end card in each workstation. They all share from one big graphics pool.

Most engineers I talk to still prefer having their own card though.

I wish they would release a consumer version though. Could have one badass gpu in one system, allowing multiple gamers to use it at once.

7 years ago

BassMan

Follow

17834

Forum Posts

0

Wiki Points

0

Followers

Reviews: 226

User Lists: 0

#6 Edited By BassMan

Member since 2002 • 17834 Posts

@Xplode_games said:

Although a 1080 ti will breeze by anything you throw at it even 4k. Even if you have to update to the latest and greatest every year for the next 5 years, you're still nowhere near the price of this new Volta card.

It's beyond overpriced but may work for millionaires.

I have a 1080 Ti and I can tell you straight up that it does not breeze by anything you throw at it at 4K. 4K/60 Ultra is very demanding and the newer more advanced AAA titles struggle. Even when you lower settings, it is still difficult to hit a steady 4K/60.

7 years ago

Primorandomguy

Follow

3368

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 5

#7 Primorandomguy

Member since 2014 • 3368 Posts

Only 150,000 dollars! What a steal!

7 years ago

ronvalencia

Follow

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#8 Edited By ronvalencia

Member since 2008 • 29612 Posts

From http://www.anandtech.com/show/11367/nvidia-volta-unveiled-gv100-gpu-and-tesla-v100-accelerator-announced

Tensor Cores are a new type of core for Volta that can, at a high level, be thought of as a more rigid, less flexible (but still programmable) core geared specifically for Tensor deep learning operations. These cores are essentially a mass collection of ALUs for performing 4x4 Matrix operations; specifically a fused multiply add (A*B+C), multiplying two 4x4 FP16 matrices together, and then adding that result to an FP16 or FP32 4x4 matrix to generate a final 4x4 FP32 matrix.
The significance of these cores are that by performing a massive matrix-matrix multiplication operation in one unit, NVIDIA can achieve a much higher number of FLOPS for this one operation. A single Tensor Core performs the equivalent of 64 FMA operations per clock (for 128 FLOPS total), and with 8 such cores per SM, 1024 FLOPS per clock per SM. By comparison, even with pure FP16 operations, the standard CUDA cores in an SM only generate 256 FLOPS per clock. So in scenarios where these cores can be used, NV is slated to be able to deliver 4x the performance versus Pascal

Global Foundry's 7 nm version Vega 10 reduces the size to about Polaris 10's 232 mm^2 i.e. about 260 mm^2, hence that's about 25 TFLOPS Fp16 and 12.5 TFLOPS Fp32. In terms of FLOPS, that's about 20 percent from GV100.

Scorpio's GPU area size is about 283 mm^2, hence any 7 nm version GPU would be faster than 7 nm shrink Vega 10.

@loco145:

Your argument would be better if you actual own GV100 instead of being a cheer squad.

GV100 is +800 mm^2 size chip, hence very expensive when compared to 417 mm^2 GTX 1080 Ti/Titan XP.

For future 7 nm GPU, AMD could design 64bit ALU with double rate Fp32 (float2) and quad rate Fp16 (half4).

From https://msdn.microsoft.com/en-us/library/windows/desktop/mt733232(v=vs.85).aspx

The <type> parameter and return value for these functions implies the type of the expression, the supported types are those from the following list that are also present in the target shader model for your app:

half, half2, half3, half4
float, float2, float3, float4
double, double2, double3, double4
int, int2, int3, int4
uint, uint2, uint3, uint4
short, short2, short3, short4
ushort, ushort2, ushort3, ushort4
uint64_t, uint64_t2, uint64_t3, uint64_t4

Shader Model 6 has support for future double rate Fp32 (float2) and quad rate Fp16 (half4).

The GPU hardware road map is already already indicated by Shader Model 6's specification.

Tensor Cores is effectively a simple 256 bit SIMD FMA unit i.e. (256 bit packed x 256 bit packed + 256 bit packed).

256 bit SIMD datatype can hold 16 16bit datatypes.

7 years ago

appariti0n

Follow

5013

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#9 Edited By appariti0n

Member since 2009 • 5013 Posts

@BassMan: I only have a scrubby 1080, and have to set games to ultra, then manually tweak settings down until I'm happy.

About the only modern games I can run at 4K totally maxed with a solid 60 fps are Doom and MGSV, both of which feature engines that are so optimized it's ridiculous. And Doom runs at 144 fps, the refesh of my monitor. I really hope Vulkan catches on with devs.

7 years ago

loco145

Follow

12226

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#10 loco145

Member since 2006 • 12226 Posts

@ronvalencia: That burn!

6 years ago

Dibdibdobdobo

Follow

6683

Forum Posts

0

Wiki Points

0

Followers

Reviews: 3

User Lists: 0

#11 Edited By Dibdibdobdobo

Member since 2008 • 6683 Posts

So it's one of these in the Scorpio?!

6 years ago

HalcyonScarlet

Follow

13668

Forum Posts

0

Wiki Points

0

Followers

Reviews: 2

User Lists: 0

#12 Edited By HalcyonScarlet

Member since 2011 • 13668 Posts

Somehow I don't think Nvidia are concerned with Vega, when they make the next round of GPUs. Still progressing.

As for NVLink, I thought they were doing that with Pascal.

6 years ago

KungfuKitten

Follow

27389

Forum Posts

0

Wiki Points

0

Followers

Reviews: 42

User Lists: 0

#13 Edited By KungfuKitten

Member since 2006 • 27389 Posts

@Xplode_games said:

Although a 1080 ti will breeze by anything you throw at it even 4k. Even if you have to update to the latest and greatest every year for the next 5 years, you're still nowhere near the price of this new Volta card.

It's beyond overpriced but may work for millionaires.

I'm looking into 1440p at 100+fps and a 1080ti barely cuts it today... let alone in 3 years. I wouldn't say no to faster cards (BTW it's so funny to hear the idea of Scorpio running 4k smoothly when I can't find a card to comfortably run future games. I guess that console efficiency is really going to start kicking in :b)

That price tho. Wow... It's not meant for games anyway... But the deep learning potential... Our cellphones will take over the world within 25 years...

6 years ago

o0squishy0o

Follow

2802

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#14 o0squishy0o

Member since 2007 • 2802 Posts

@HalcyonScarlet said:

Somehow I don't think Nvidia are concerned with Vega, when they make the next round of GPUs. Still progressing.

As for NVLink, I thought they were doing that with Pascal.

Exactly, they are probably trying to build up their own hype; certainly obvious with the fact this isn't related to gaming but still will generate some excitement. AMD Vega really does need to be something special, even if its just behind 1080ti but is offered at 70% of the price for example would be a huge benefit for gamers and the market.

6 years ago

ronvalencia

Follow

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#15 ronvalencia

Member since 2008 • 29612 Posts

@HalcyonScarlet said:

Somehow I don't think Nvidia are concerned with Vega, when they make the next round of GPUs. Still progressing.

As for NVLink, I thought they were doing that with Pascal.

AMD is distracted by game console customization request changes.

6 years ago

APiranhaAteMyVa

Follow

4160

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#16 APiranhaAteMyVa

Member since 2011 • 4160 Posts

Cool PC gamers will be able to get another 100fps out of CS:GO

6 years ago

appariti0n

Follow

5013

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#17 appariti0n

Member since 2009 • 5013 Posts

@ronvalencia said:

From http://www.anandtech.com/show/11367/nvidia-volta-unveiled-gv100-gpu-and-tesla-v100-accelerator-announced

Tensor Cores are a new type of core for Volta that can, at a high level, be thought of as a more rigid, less flexible (but still programmable) core geared specifically for Tensor deep learning operations. These cores are essentially a mass collection of ALUs for performing 4x4 Matrix operations; specifically a fused multiply add (A*B+C), multiplying two 4x4 FP16 matrices together, and then adding that result to an FP16 or FP32 4x4 matrix to generate a final 4x4 FP32 matrix.
The significance of these cores are that by performing a massive matrix-matrix multiplication operation in one unit, NVIDIA can achieve a much higher number of FLOPS for this one operation. A single Tensor Core performs the equivalent of 64 FMA operations per clock (for 128 FLOPS total), and with 8 such cores per SM, 1024 FLOPS per clock per SM. By comparison, even with pure FP16 operations, the standard CUDA cores in an SM only generate 256 FLOPS per clock. So in scenarios where these cores can be used, NV is slated to be able to deliver 4x the performance versus Pascal

Global Foundry's 7 nm version Vega 10 reduces the size to about Polaris 10's 232 mm^2 i.e. about 260 mm^2, hence that's about 25 TFLOPS Fp16 and 12.5 TFLOPS Fp32. In terms of FLOPS, that's about 20 percent from GV100.

Scorpio's GPU area size is about 283 mm^2, hence any 7 nm version GPU would be faster than 7 nm shrink Vega 10.

@loco145:

Your argument would be better if you actual own GV100 instead of being a cheer squad.

GV100 is +800 mm^2 size chip, hence very expensive when compared to 417 mm^2 GTX 1080 Ti/Titan XP.

For future 7 nm GPU, AMD could design 64bit ALU with double rate Fp32 (float2) and quad rate Fp16 (half4).

From https://msdn.microsoft.com/en-us/library/windows/desktop/mt733232(v=vs.85).aspx

The <type> parameter and return value for these functions implies the type of the expression, the supported types are those from the following list that are also present in the target shader model for your app:

half, half2, half3, half4
float, float2, float3, float4
double, double2, double3, double4
int, int2, int3, int4
uint, uint2, uint3, uint4
short, short2, short3, short4
ushort, ushort2, ushort3, ushort4
uint64_t, uint64_t2, uint64_t3, uint64_t4

Shader Model 6 has support for future double rate Fp32 (float2) and quad rate Fp16 (half4).

The GPU hardware road map is already already indicated by Shader Model 6's specification.

Tensor Cores is effectively a simple 256 bit SIMD FMA unit i.e. (256 bit packed x 256 bit packed + 256 bit packed).

256 bit SIMD datatype can hold 16 16bit datatypes.

6 years ago

Zaryia

Follow

21607

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#18 Edited By Zaryia

Member since 2016 • 21607 Posts

@APiranhaAteMyVa said:

Cool PC gamers will be able to get another 100fps out of CS:GO

PC gets more multiplatform titles than any other system. Depending on the hardware, such as the one the OP mentioned, most of these games play noticably better on PC. Playing most big/new games at 30 fps (or lower) is unacceptable to these people.

Your logic is as silly as bringing up Ferrari sales in a Civic VS. Ferrari performance thread. In other words, you lose.

6 years ago

Elaisse

Follow

648

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#19 Elaisse

Member since 2012 • 648 Posts

Finally shovel Knight at 4k is a reality.

6 years ago

navyguy21

Follow

17443

Forum Posts

0

Wiki Points

0

Followers

Reviews: 10

User Lists: 0

#20 Edited By navyguy21

Member since 2003 • 17443 Posts

....but can it max out Crysis?

6 years ago

appariti0n

Follow

5013

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#21 Edited By appariti0n

Member since 2009 • 5013 Posts

@zaryia said:

@APiranhaAteMyVa said:

Cool PC gamers will be able to get another 100fps out of CS:GO

PC gets more multiplatform titles than any other system. Depending on the hardware, such as the one the OP mentioned, most of these games play noticably better on PC. Playing most big/new games at 30 fps (or lower) is unacceptable to these people.

Your logic is as silly as bringing up Ferrari sales in a Civic VS. Ferrari performance thread. In other words, you lose.

Not to mention 3 entire genres of games that are practically non-existent on console.

Moba, RTS (and many turn based strategy), MMOs...... but of course nobody cares about those games.

Oh, and good luck trying to play any RPGs in the black isle/bioware style. Like Torment, Tyranny, Pillars of eternity, etc.

6 years ago

ronvalencia

Follow

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#23 ronvalencia

Member since 2008 • 29612 Posts

@appariti0n said:

@ronvalencia said:

From http://www.anandtech.com/show/11367/nvidia-volta-unveiled-gv100-gpu-and-tesla-v100-accelerator-announced

Tensor Cores are a new type of core for Volta that can, at a high level, be thought of as a more rigid, less flexible (but still programmable) core geared specifically for Tensor deep learning operations. These cores are essentially a mass collection of ALUs for performing 4x4 Matrix operations; specifically a fused multiply add (A*B+C), multiplying two 4x4 FP16 matrices together, and then adding that result to an FP16 or FP32 4x4 matrix to generate a final 4x4 FP32 matrix.
The significance of these cores are that by performing a massive matrix-matrix multiplication operation in one unit, NVIDIA can achieve a much higher number of FLOPS for this one operation. A single Tensor Core performs the equivalent of 64 FMA operations per clock (for 128 FLOPS total), and with 8 such cores per SM, 1024 FLOPS per clock per SM. By comparison, even with pure FP16 operations, the standard CUDA cores in an SM only generate 256 FLOPS per clock. So in scenarios where these cores can be used, NV is slated to be able to deliver 4x the performance versus Pascal

Global Foundry's 7 nm version Vega 10 reduces the size to about Polaris 10's 232 mm^2 i.e. about 260 mm^2, hence that's about 25 TFLOPS Fp16 and 12.5 TFLOPS Fp32. In terms of FLOPS, that's about 20 percent from GV100.

Scorpio's GPU area size is about 283 mm^2, hence any 7 nm version GPU would be faster than 7 nm shrink Vega 10.

@loco145:

Your argument would be better if you actual own GV100 instead of being a cheer squad.

GV100 is +800 mm^2 size chip, hence very expensive when compared to 417 mm^2 GTX 1080 Ti/Titan XP.

For future 7 nm GPU, AMD could design 64bit ALU with double rate Fp32 (float2) and quad rate Fp16 (half4).

From https://msdn.microsoft.com/en-us/library/windows/desktop/mt733232(v=vs.85).aspx

The <type> parameter and return value for these functions implies the type of the expression, the supported types are those from the following list that are also present in the target shader model for your app: