Nvidia Introduces 4nm GPU H100 With 80 Billion Transistors, PCIe 5.0 And HBM3 – Computer

Nvidia announced its H100 acceleration program for data centers and HPC. This PCIe 5.0 GPU is produced on a TSMC 4N node and features HBM3 memory with up to 3TB/s bandwidth. The Nvidia H100 succeeds the current A100 GPU.

Nvidia H100 GPU It is based on Hopper, a GPU architecture aimed at data centers, HPC and Ampere Follow up in this field. H100 consists of 80 billion transistors and is produced on the TSMC 4N process. This is a modified version of the TSMC N4 process, especially for Nvidia. The Nvidia H100 Again a monolithic chip, just like the A100. Initially, it was rumored that Nvidia would offer a data center GPU with a multi-chip design, consisting of Dies† AMD did just that last year with the Instinct MI200 series.

The current A100 is produced on a modified version of the 7 nm TSMC process and consists of 54.2 billion transistors. Nvidia claims that the H100 provides up to three times more computing power than the A100 in fp16, tf32 and fp64 and six times more in fp8. The H100 GPU has a size of 814 mm². This is slightly smaller than the current GA100, which has who – which– An area of 826 mm².

Nvidia H100 SXM5 (left) and H100 PCIe

HBM3 for SXM5, HBM2e for PCIe variant

Nvidia offers two variants of the H100. The focus appears to be on the SXM5 variant, which has 128 multiprocessor flow For a total of 16896 fp32 CUDA kernels. This card gets 50MB L2 cache and 80GB HBM3 memory on the 5120-bit memory bus, for a maximum memory bandwidth of about 3TB/s. This card gets 50MB L2 cache and 700W tdp. Users can combine multiple H100 SXM GPUs with Nvidia’s NVLink interconnect. According to Nvidia, 4G offers bandwidths of up to 900 Gb/s.

There will also be a PCIe 5.0 x16 variant for more standard servers. This model gets 114 text messages and 14,592 CUDA cores. Furthermore, the PCIe variant gets a 40MB L2 cache, just like the current A100. Remarkably, the PCIe variant still has a slower HBM2e memory, according to Published by Nvidia’s white paper Hopper on Tuesday† At 80 GB, the amount equals the SXM model. The PCIe variant gets a tdp of 350W.

Nvidia Hopper H100 GPU

New hopper features: adapter drive, DPX instruction set

Hopper’s architecture itself has also been adapted in comparison to Ampere. Hopper and H100 feature a new switch engine, which combines a new type of Tensor core with a software suite for processing fp8 and fp16 formats for training the switch network. This is a kind of deep learning model.

For cloud computing, the H100 can be divided into up to seven cases† Ampere was already able to do this, but with Hopper they were completely isolated from each other. In addition, Hopper gets a new DPX instruction set dedicated to dynamic programming. Nvidia claims that the H100 performs up to seven times better than the A100 without DPX in this use case.

DGX Systems and SuperPods

Nvidia also offers the DGX H100 system with eight H100 GPUs. With its eight H100 GPUs, this system has 640GB of HBM3 memory with a total bandwidth of 24TB/s. Users can integrate up to 32 DGX systems via NVLink connections. Nvidia calls it the DGX SuperPod. Such a 32-node system should offer massive computing power, Nvidia claims. This indicates fp8 computing power. The company is building an EOS supercomputer itself, consisting of 18 DGX SuperPods with a total of 4,608 H100 GPU.

Nvidia has yet to announce the cost of the H100 GPU. It is also not yet clear what the cost of the H100 DGX systems or the DGX H100 SuperPods will be. Hopper is also not expected to be used in consumer GPUs. Later this year, it was reported that Nvidia will introduce its own Lovelace architecture for the new GeForce RTX graphics cards.

Nvidia Hopper along with previous Nvidia HPC GPUs
building	Huber	Ampere	Volta
GPU	H100, TSMC 4 nm	GA100, TSMC 7 nm	GV100, TSMC 12 nm
surface die	814 mm²	826 mm²	815 mm²
transistors	80 billion	54 billion	21.1 billion
CUDA cores (fp32)	SXM: 16896 PCIe slot: 14.592	6912	5120
tensor cores	SXM: 528 PCIe: 456	432	640
memory	SXM: 80 GB HBM3 PCIe: 80 GB HBM2e	40 GB / 80 GB HBM2e	16 GB / 32 GB HBM2.0
FP32 . conveyor	SXM: 60 flops PCIe: 48Tflops	19.5 flops	15.7 flops
FP64 . Vector	SXM: 30 flops PCIe: 24Tflops	9.7 flops	7.8 flops
FP16 . Tensioner	SXM: 1000Tflops PCIe: 800Tflops	312 flops	125 Flups
TF32 tensor	SXM: 500flups PCIe: 400Tflops	156 flop	Unavailable
FP64 . Tensioner	SXM: 60 flops PCIe: 48Tflops	19.5 flops	Unavailable
INT8 . motor	SXM: 2000 peaks PCIe: 1600pcs	624 peaks	Unavailable
Tdp	up to 700 watts	up to 400 watts	up to 300 watts
form factor	SXM5 / PCIe 5.0	SXM4 / PCIe 4.0	SXM2 / PCIe 3.0

Nvidia Introduces 4nm GPU H100 With 80 Billion Transistors, PCIe 5.0 And HBM3 – Computer – News

Nvidia Introduces 4nm GPU H100 With 80 Billion Transistors, PCIe 5.0 And HBM3 – Computer – News

HBM3 for SXM5, HBM2e for PCIe variant

New hopper features: adapter drive, DPX instruction set

DGX Systems and SuperPods

Leave a Reply Cancel reply