NVIDIA’s latest GPU platform is Blackwell (Figure A), which companies including AWS, Microsoft and Google plan to use for generative AI and other modern computing tasks, NVIDIA CEO Jensen Huang delivered a keynote speech at the NVIDIA GTC conference in San Jose, California on March 18. Announce.
Figure A

Blackwell-based products will enter the market by the end of 2024 from NVIDIA partners around the world. Huang announced a range of additional technologies and services from NVIDIA and its partners, saying that generative AI is just one aspect of accelerated computing.
“When you go accelerated, your infrastructure is the CUDA GPU,” Huang said, referring to CUDA, NVIDIA’s parallel computing platform and programming model. “When that happens, it’s the same infrastructure that generates artificial intelligence.”
Blackwell supports large language model training and inference
Huang said the Blackwell GPU platform consists of two dies connected via a 10 terabit per second inter-die interconnect, meaning each side can essentially work as if “two dies think it’s one die.” It has 208 billion transistors and is manufactured using NVIDIA’s 208 billion 4NP TSMC process. It has 8 TB/S memory bandwidth and 20 pentaFLOPS of AI performance.
For enterprises, this means Blackwell can train and infer AI models with up to 10 trillion parameters, NVIDIA said.
Blackwell is enhanced with the following technologies:
- The second generation of TensorRT-LLM and NeMo Megatron are both from NVIDIA.
- Compared to the first-generation Transformer engine, the framework’s calculations and model size have doubled.
- Use native interface encryption protocols for confidential operations to ensure privacy and security.
- Dedicated decompression engine for accelerating database queries in data analysis and data science.
Regarding security, Huang said the reliability engine “performs a self-test, an in-system test, on every bit of memory on the Blackwell chip and all the memory connected to it. It’s like we ship the Blackwell chip with its own tester. “
Blackwell-based products will be offered by partner cloud service providers, NVIDIA Cloud Partner Programming Firms and select sovereign clouds.
The Blackwell series of GPUs follows the Grace Hopper series of GPUs, which debuted in 2022 (Picture B). NVIDIA said Blackwell will run instant generation AI on trillion-parameter LLM, with cost and energy consumption 25 times lower than the Hopper series.
Picture B

NVIDIA GB200 Grace Blackwell Superchip connects multiple Blackwell GPUs
In addition to Blackwell GPUs, the company also released the NVIDIA GB200 Grace Blackwell Superchip, which connects two NVIDIA B200 Tensor Core GPUs with NVIDIA Grace CPUs to provide a new combined platform for LLM inference. The NVIDIA GB200 Grace Blackwell Superchip connects to the company’s new NVIDIA Quantum-X800 InfiniBand and Spectrum-X800 Ethernet platforms at speeds up to 800 GB/S.
GB200 will be available later this year on NVIDIA DGX Cloud and through AWS, Google Cloud and Oracle Cloud Infrastructure execution units.
New server design envisions trillion-parameter artificial intelligence model
GB200 is a component of the newly released GB200 NVL72, a rack-level server design that packages 36 Grace CPUs and 72 Blackwell GPUs to achieve 1.8 exaFLOPs of AI performance. NVIDIA is envisioning possible use cases for large-scale, trillion-parameter LLM, including conversational persistent memory, complex scientific applications, and multimodal models.
GB200 NVL72 combines the fifth-generation NVLink connector (5,000 NVLink cables) and the GB200 Grace Blackwell super chip to provide huge computing power, which Huang calls “an exoflops artificial intelligence system in a rack.”
“This is more than the average bandwidth of the Internet…we can basically send everything to everyone,” Huang said.
“Our goal is to continuously reduce computing costs and energy – which are directly related -” Huang said.
Cooling the GB200 NVL72 requires two liters of water per second.
Next-generation NVLink brings accelerated data center architecture
Fifth-generation NVLink provides 1.8TB/s bidirectional throughput per GPU communication between up to 576 GPUs. This iteration of NVLink is designed for use with today’s most powerful complex LL.M.
“In the future, data centers will be viewed as artificial intelligence factories,” Huang said.
Introduction to NVIDIA Inference Microservices
Another element of a possible “artificial intelligence factory” is NVIDIA Inference Microservices (NIM), which Huang describes as “a new way of receiving and packaging software.”
NIM, used internally by NVIDIA, is a container for training and deploying generative AI. NIM allows developers to use APIs, NVIDIA CUDA and Kubernetes in one suite.
See: Python Still Lives Most popular programming languages According to TIOBE index. (Technology Republic)
Instead of writing code to program the AI, developers can “build an AI team” to handle processes within NIM, Huang said.
“We want to build chatbots—artificial intelligence co-pilots—to work alongside our designers,” Huang said.
NIM will be available starting March 18th. Developers can try NIM for free and run them with an NVIDIA AI Enterprise 5.0 subscription.
Other important announcements from NVIDIA at GTC 2024
Jen-Hsun Huang announced a series of new products and services involving accelerated computing and generative AI during the NVIDIA GTC 2024 keynote speech.
NVIDIA announces cuPQC, a library for accelerated post-quantum cryptography. Developers working on post-quantum cryptography can contact NVIDIA for updates on availability.
NVIDIA’s X800 series of network switches accelerate AI infrastructure. Specifically, the X800 series includes NVIDIA Quantum-X800 InfiniBand or NVIDIA Spectrum-X800 Ethernet switches, NVIDIA Quantum Q3400 switches, and NVIDIA ConnectXR-8 SuperNIC. X800 switches will be available in 2025.
Key partners detailed during NVIDIA’s keynote include:
- NVIDIA’s full-end AI platform will be available on Oracle Enterprise AI starting March 18.
- AWS will provide access to Amazon EC2 instances based on NVIDIA Grace Blackwell GPUs and the NVIDIA DGX cloud with Blackwell security.
- NVIDIA will accelerate the development of Google Cloud through the NVIDIA Grace Blackwell AI computing platform and NVIDIA DGX Cloud services that will be launched on Google Cloud. Google hasn’t confirmed an availability date yet, but it’s likely to be late 2024. In addition, as of March 18, the DGX Cloud platform powered by NVIDIA H100 has been generally available on Google Cloud.
- Oracle will use NVIDIA Grace Blackwell in OCI Supercluster, OCI Compute and NVIDIA DGX Cloud on Oracle Cloud Infrastructure. As of March 18, some Oracle-NVIDIA joint sovereign AI services have been launched.
- Microsoft will use NVIDIA Grace Blackwell Superchip to accelerate Azure. It is expected to launch later in 2024.
- Dell will use NVIDIA’s AI infrastructure and software suite to build the Dell AI Factory, an end-to-end AI enterprise solution that will be available through traditional channels and Dell APEX on March 18. Dell will use NVIDIA Grace Blackwell Superchips as the basis for a rack-scale, high-density, liquid-cooled architecture at an undisclosed time in the future. Superchip will be compatible with Dell’s PowerEdge servers.
- SAP will add NVIDIA search enhancement generation capabilities to its Joule copilot. In addition, SAP will use NVIDIA NIM and other joint services.
“The entire industry is preparing for Blackwell,” Huang said.
NVIDIA AI chip competitors
NVIDIA competes primarily with AMD and Intel in providing enterprise artificial intelligence. Qualcomm, SambaNova, Groq and various cloud service providers are in the same field when it comes to generative AI inference and training.
AWS has its proprietary inference and training platforms: Inferentia and Trainium. In addition to working with NVIDIA to develop multiple products, Microsoft also has its own AI training and inference chip: the Maia 100 AI accelerator in Azure.
Disclaimer: NVIDIA paid for my airfare, accommodation, and some meals to attend the NVIDIA GTC event in San Jose, California, March 18-21.