For the first time, Nvidia activates the sixth memory controller in a Hopper GPU. This increases the transfer rate to an enormous 3.9 TB/s.
Four twin-pack H100 NVL in one server. (Image: Nvidia)
Nvidia launches a new version of its Hopper GPU H100 with more and faster memory: The H100 NVL uses six instead of five HBM3 memory stacks. An H100 NVL has a storage capacity of 94 GB and a transfer rate of 3.9 TB/s. Two models in a double pack are connected to each other via NV links and together provide 188 GB.
For comparison: The current desktop top model GeForce RTX 4090 is satisfied with 24 GB GDDR6X RAM, which can transfer a good 1 TB/s.
The previous H100 variants already had physically six HBM3 stacks for the SXM5 module or HBM2e stacks on the PCI card, but one stack always remained unused in order to increase the yield of functional copies. In the case of the H100 NVL, only 2 GB of the six 16 GB HBM3 are deactivated for this purpose, as Nvidia confirmed to us when asked. Apparently, a single tier of memory is inactive in an HBM3 stack.
AI training in focus
Nvidia advertises the H100 NVL explicitly for training large AI models – ChatGPT is cited as an example. Accordingly, the manufacturer is focusing on the integrated tensor cores: In FP16 format, they should achieve 1979 teraflops, with FP8 accuracy halved again, twice the value, i.e. 3958 teraflops. In classic workloads, the shader cores, like the SXM5 version, handle 34 FP64 or 67 FP32 teraflops.
Specifications of the H100 NVL compared to its sister versions. Caution: NVidia specifies the values for the H100 NVL for a GPU double pack. (Image: Nvidia)
This speaks for the same chip configuration as in the SXM5 version of the H100 GPU with 132 shader multiprocessors, 16,896 shader cores, 528 tensor cores, and a computing clock of around 1.8 GHz. The chip order manufacturer TSMC manufactures the 814 mm² huge GPU in the 4-nanometer 4N process.
Depending on the application, however, the H100 NVL should be slower than the previous H100 SXM5 under continuous load, since Nvidia lowers the maximum power consumption from 700 to 350 to 400 watts. This in turn should benefit efficiency.
The H100 NVL only comes as a PCI Express 5.0 card. As a GPU accelerator, the model has no image outputs; In addition, its fan is missing, so it is absolutely dependent on strong case ventilation. Nvidia expects partner systems for servers and data centers that use up to eight H100 NVL GPUs. Nvidia has not yet commented on the prices.