"68GB/s+68GB=136GB/s" fanfiction from Tormentos.
"68GB/s+68GB=136GB/s" fanfiction from Tormentos.
AMD GCN's Fold @Home results
"Some of the key highlights are:
-Up to 120,000 PPD on GTX Titan, and 110,000 PPD on HD 7970"
"monitor resolution of 1600x1200 - again all tests have DX11 / Ultra Quality mode with 2xMSAA enabled and HBAO"
Note the 7870 GE's 1600x1200p vs PS4's 1600x900p result.
PCPartPicker part list: http://pcpartpicker.com/p/1VRF7
CPU: AMD Athlon II X4 740 3.2GHz Quad-Core Processor ($67.00 @ Amazon)
Motherboard: Biostar A55MD2 Micro ATX FM2 Motherboard ($38.25 @ Newegg)
Memory: G.Skill Ripjaws X Series 8GB (2 x 4GB) DDR3-2133 Memory ($63.75 @ Newegg)
Storage: Toshiba 500GB 3.5" 7200RPM Internal Hard Drive ($44.99 @ Newegg)
Video Card: PowerColor Radeon HD 7870 XT 2GB Video Card ($163.98 @ Newegg)
Case: Sentey CS1-1420 PLUS ATX Mid Tower Case ($19.99 @ Newegg)
Power Supply: CoolMax 500W 80 PLUS Certified ATX12V / EPS12V
Power Supply ($19.99 @ Newegg)
Optical Drive: Lite-On iHAS124-04 DVD/CD Writer ($17.98 @ OutletPC)
Total: $435.93 (Prices include shipping, taxes, and discounts when available.) (Generated by PCPartPicker 2013-11-01 10:18 EDT-0400)
PS; 7870 XT > 7870 GE
Digital Foundry: Crytek is at the bleeding edge of rendering tech and your demands from hardware are very high, so how satisfied are you with the final designs of the Xbox One and PlayStation 4?
Cevat Yerli: Both consoles have a DX 11.1+ capable GPU with full compute shader support which allows us to come up with new creative rendering techniques that were not possible before. The GPUs are very efficient in performing math operations and the CPUs, in contrast to the previous PowerPC-based architectures, have standard PC features like out-of-order execution and branch prediction. All this reduces the need for micro-optimisation and allows us to focus more on the high-level algorithms which is usually the more rewarding part of development. We are also looking forward to seeing what the PlayStation 4 will offer in regards to online compute capacities, and the strategy rolled out for Xbox One cloud support is certainly going to be very interesting in terms of compute power for next-gen games. I also think Microsoft's decision to include Kinect as standard is a positive one, as it avoids fragmenting the market and allows developers to treat its functionality as a given.
Digital Foundry: On the face of it the bigger GPU and wider bandwidth of the PlayStation 4 makes it far more powerful than the Xbox One. Yet developers like John Carmack - and even PC benchmarks on equivalent hardware - suggest that the two platforms may be closer than the specs suggest. What's your assessment?
Cevat Yerli: Both next-gen platforms have excellent specs and provide wins against each other in a variety of areas. But, in essence, both of them are going to run next-generation games in more or less the same quality due to the diminishing returns of optimising for these little differences. That being said, platform-exclusive titles might be able to take advantage of these slight variations on both Xbox One and PlayStation 4.
Tormentos' memory bandwdith stupidity
X1 can add memory bandwidth since it has two memory pools (think of L-shaped multi-memory controller setups) while PS4 has a single memory pool.
Me: "From TC's link, X1 is capable to drive superior (against PS4) raytracing write results"
tormentos wrote:The same developer on xbox one = True. Double Standard FTW.
You fool, X1's superior write performance was backed by a secondary source from Gaigin
How much more powerful?
AY: It depends what youre doing. GPU, like 40 per cent more powerful. DDR5 is basically 50 per cent more powerful than DDR3, but the memory write [performance] is bigger on Xbox One so it depends on what youre doing.
How is that going to translate to on-screen results for the kinds of games you want to make? So to optimise War Thunder on both consoles you could hypothetically make a better, prettier version on PS4?
KY: Probably yes. But again, thats not a very big deal.
I don't think you get it,MS combined bandwidth on xbox 360 to claim superiority over the PS3,as if bandwidth could be add like apples,in fact the EDRAM mean nothing in the end the PS4 with far lower bandwidth got the job done,and exclusive actually outshine MS ones
They didn't show the bandwidth link between 360's EDRAM and GPU while X1's hotchips reveal shows the bandwidth link between ESRAM and the GPU.
Dude Ronvalencia quoted 133GB/s bandwidth that was from a nameless developer on DF article
Microsoft techs have found that the hardware is capable of reading and writing simultaneously. Apparently, there are spare processing cycle "holes" that can be utilised for additional operations. Theoretical peak performance is one thing, but in real-life scenarios it's believed that 133GB/s throughput has been achieved with alpha transparency blending operations (FP16 x4).
The nameless developer for the 133GB/s claim was Microsoft.
tormentos is a liar.
He continued "Additionally, when we switched from the old VLIW architecture to the GCN core there was a significant updates to all parts of the driver was needed although not really spoken about the entire memory management on GCN is different to prior GPU's and the initial software management for that was primarily driven by schedule and in the meantime we've been rewriting it again and we have discovered that the new version has also improved frame latency in a number of cases so we are accelerating the QA and implementation of that."
The CU has its own hardware scheduler that's able to assign wavefronts to available VUs with limited out-of-order capability to avoid dependency bottlenecks
AMD GCN's CU can process multiple Kernels at once.
This is the key to better compute performance because it gives each VU the ability to work on different wavefronts if a dependency exists in the queue
In GCN, each SIMD unit is assigned its own 40-bit program counter and instruction buffer for 10 wavefronts. The whole CU can thus have 40 wavefronts in flight, each potentially from a different work-group or kernel, which is substantially more flexible than previous designs. This means that a GCN GPU with 32 CUs, such as the AMD Radeon HD 7970, can be working on up to 81,920 work items at a time.
However the CU and SIMDs can select a different wavefront to work on; this can be another wavefront spawned by the same task (e.g. a different group of pixels/values) or it can be a wavefront from a different task entirely.
On existing AMD GCN, each AMD ACE unit can handle a parallel stream of commands.
"Each compute unit can execute instructions from multiple kernels at once."
PowerPoint Slide for AMD Radeon HD 7xx0 GCN.
Address Translation Services (ATS) and Page Request Interface (PRI) extension is defined by PCI-SIG PCIe standards.
Memory expansion add-on cards via PCI-E
These boards expands PC's system memory via PCI-E slots. HSA GPU add-on cards are similar to Intel's memory expansion boards i.e. both adds additional memory pool to system memory via unified virtual memory address methods.
Example practical PCI-E latency with old PC hardware (reverting back to my NVIDIA mode).
About 11 microseconds (for device to host with display attached) and 6.5 microseconds (for host to device).
Hardware spec: Intel X5570 (Nehalem) + Tesla C2050 + Fedora 13 (x86_64, 126.96.36.199-61) + NVIDIA Driver 260.19.21/CUDA 3.2.16 + GCC 4.4.5
Intel Nehalem = 1st gen Intel Core iX and it doesn't support PCI-E version 3.0 (with Intel Ivybridge).
Both 7870 GE and 7970 has 32 ROPS.
7970 = 264 GB/s
7870 = 153.6 GB/s
7870 has 58.18 percent of 7970 of memory bandwidth.
7870 has 59.8 percent of 7970 of color fill rate.
The term fillrate usually refers to the number of pixels a video card can render and write to video memory in a second
Notice 5870 and 7870's fill rate is almost identical i.e. nearly no changes with GDDR5 and ROPS hardware.
AMD Radeon HD 5870 has theoretical 153 GB/s memory bandwidth.
AMD Radeon HD 5870 has theoretical 153 GB/s with a 108 GB/s practical i.e. 70.5 percent efficient.
If we apply 70.5 percent efficiency on 176 GB/s, it would yield 124 GB/s.
eight ROP partitions, each capable of outputting four colored pixels or 16 Z/stencil pixels per clock
A "GCN ROP module" can output 4 color pixels or 16 Z/stencil pixels.
A GCN ROP module with 1 color pixel + 4 Z/stencil pixels doesn't exist.
Use your keyboard!
Log in to comment