NVIDIA is a leading technology company renowned for its advanced computing solutions. The company has recently announced the development of the NVIDIA DGX GH200 AI supercomputer. NVIDIA GH200 Grace Hopper Superchips and the NVIDIA NVLink Switch System power this groundbreaking system. The combination of which will surely revolutionize the field of Artificial Intelligence (AI). The company equipped the DGX GH200 with unprecedented capabilities. It aims to enable the creation of enormous next-generation models. Use Cases include generative AI language applications, recommender systems, and data analytics workloads.
Unleashing Unprecedented Performance
The NVIDIA DGX GH200 supercomputer boasts a massive shared memory space. It leverages NVLink interconnect technology and the NVLink Switch System. By combining 256 GH200 Superchips, this innovative technology allows them to operate as a single GPU. This allows it to deliver an astounding one exaflop of performance and a staggering 144 terabytes of shared memory. This remarkable feat provides nearly 500 times more memory than its predecessor, the NVIDIA DGX A100, unveiled in 2020.
Eliminating Bottlenecks with NVLink Technology
The GH200 superchips integrated into the DGX GH200 system eliminate the need for a traditional CPU-to-GPU PCIe connection. These superchips combine an Arm-based NVIDIA Grace CPU with an NVIDIA H100 Tensor Core GPU within a single package, utilizing NVIDIA NVLink-C2C chip interconnects. This groundbreaking integration results in a 7x increase in bandwidth between the GPU and CPU compared to the latest PCIe technology. Moreover, interconnect power consumption is reduced by over 5x, while the 600GB Hopper architecture GPU building block paves the way for the DGX GH200 supercomputers.
The Power of Collaboration: Grace Hopper Superchips and NVLink Switch System
DGX GH200 is the first supercomputer to combine the Grace Hopper Superchips with the NVIDIA NVLink Switch System. This new interconnect technology enables all GPUs within the DGX GH200 system to collaborate seamlessly as one unified entity. In contrast, the previous-generation system allowed only eight GPUs to be combined with NVLink while maintaining optimal performance. Furthermore, the DGX GH200 architecture, there is a 48x increase in NVLink bandwidth. The bandwidth provides the simplicity of programming a single GPU alongside the power of a massive AI supercomputer.
Exploring New Frontiers
Prominent industry players, including Google Cloud, Meta, and Microsoft, are among the first to gain access to the DGX GH200 supercomputer, allowing them to explore the immense potential of this technology for their generative AI workloads. NVIDIA also plans to offer the DGX GH200 design as a blueprint to cloud service providers and hyperscalers, empowering them to customize it further according to their infrastructure requirements.
Pioneering AI Research
The DGX GH200 has been eagerly anticipated as a research tool by AI pioneers. Google Cloud, Meta, and Microsoft recognize this innovation’s significance for advancing generative models. The enhanced NVLink scale and Grace Hopper Superchips shared memory address critical bottlenecks in large-scale AI. This allows researchers to tackle their most significant challenges in AI infrastructure and considerable AI model training.
NVIDIA Helios Supercomputer: Advancing R&D
NVIDIA’s commitment to pushing the boundaries of AI research and development is evident in its plan to build its own AI supercomputer, NVIDIA Helios, based on the DGX GH200 architecture. This dedicated supercomputer will house four DGX GH200 systems interconnected with NVIDIA Quantum-2 InfiniBand networking, facilitating accelerated data throughput for training large AI models. We can expect Helios to come online by the end of the year and feature a staggering 1,024 Grace Hopper Superchips.
NVIDIA DGX GH200: A Complete Solution for Giant Models
DGX GH200 supercomputers offer a turnkey, full-stack solution for the most demanding AI and data analytics workloads. NVIDIA software, including NVIDIA Base Command and NVIDIA AI Enterprise, provides comprehensive AI workflow management, enterprise-grade cluster management, accelerated compute libraries and optimized system software. This integration streamlines the development and deployment of production AI, encompassing various applications such as generative AI, computer vision, and speech AI.