NVIDIA and Microsoft Collaborate to Create a Powerful Cloud AI Computer
NVIDIA and Microsoft announced a multi-year partnership, enabling businesses to train, deploy, and scale AI, such as large, cutting-edge models. The team-up is supposed to kickstart the creation of one of the most powerful AI supercomputers in the world. This supercomputer will be powered by Microsoft Azure’s cutting-edge supercomputing infrastructure in conjunction with NVIDIA GPUs, networking, and full-stack AI software.
The ND- and NC-series virtual machines in Azure’s cloud-based AI supercomputer are robust and scalable virtual machines. They are designed for distributed AI training and inference. Hundreds of NVIDIA A100 and H100 GPUs, NVIDIA Quantum-2 400Gb/s InfiniBand networking, and the NVIDIA AI Enterprise software suite are added to its platform. It is the first public cloud to use NVIDIA’s cutting-edge AI stack.
NVIDIA will work with Azure to research and accelerate advancements in generative AI. It is a rapidly developing field of artificial intelligence with foundational models like the Megatron Turing NLG 530B. They are the foundation for unmonitored, self-learning algorithms to generate new text, code, digital images, video, or audio.
Additionally, the businesses will work together to improve Microsoft’s DeepSpeed deep learning optimization tool. Azure corporate clients will get access to NVIDIA’s entire suite of AI processes, including software development kits tailored for Azure.
“AI technology advances, as well as industry adoption, are accelerating. The breakthrough of foundation models has triggered a tidal wave of research, fostered new startups, and enabled new enterprise applications. Our collaboration with Microsoft will provide researchers and companies with state-of-the-art AI infrastructure and software to capitalize on the transformative power of AI,” said Manuvir Das, vice president of enterprise computing at NVIDIA.
Peak Scalability on Azure Using NVIDIA Compute and Quantum-2 InfiniBand
AI-optimized virtual machine instances from Microsoft Azure are the first public cloud instances to use NVIDIA Quantum-2 400Gb/s InfiniBand networking. They are built with the most cutting-edge data center GPUs.
Customers may deploy hundreds of GPUs in a single cluster –
- To train even the most significant huge language models.
- Create the most intricate recommendation systems at scale, and
- Allow generative AI at scale
With NVIDIA A100 GPUs, the current Azure instances provide NVIDIA Quantum 200Gb/s InfiniBand networking. Future versions will use NVIDIA H100 GPUs and NVIDIA Quantum-2 400Gb/s InfiniBand networking. They can be used with Azure’s cutting-edge compute cloud infrastructure, networking, and storage. These AI-optimized products will deliver the maximum scalable potential for AI training and deep learning inference applications of any scale.
Speeding up AI Development and Application
The platform will also support various AI services and applications, such as Microsoft DeepSpeed and the NVIDIA AI Enterprise software package.
Microsoft DeepSpeed will make use of the NVIDIA H100 Transformer Engine to –
- Boost transformer-based models are used for substantial language models.
- Generative AI, and
- Generating code, among other uses.
The technology uses DeepSpeed’s 8-bit floating point accuracy capabilities to speed up AI calculations for transformers significantly.
The NVIDIA AI platform, NVIDIA AI Enterprise, is approved and supported on Microsoft Azure instances with NVIDIA A100 GPUs. A subsequent software update will introduce support for cases azure with NVIDIA H100 GPUs.
“AI is fueling the next wave of automation across enterprises and industrial computing, enabling organizations to do more with less as they navigate economic uncertainties. Our collaboration with NVIDIA unlocks the world’s most scalable supercomputer platform, which delivers state-of-the-art AI capabilities for every enterprise on Microsoft Azure,” said Scott Guthrie, executive vice president of the Cloud + AI Group at Microsoft.