During the GPU Technology Conference (GTC), NVIDIA unveiled the Blackwell B200 GPU, a chip that the company claims is the “world’s most powerful” for artificial intelligence (AI) applications. This revelation comes hot on the heels of NVIDIA’s success with the H100 AI chip, which propelled the company to unprecedented heights, making it a multitrillion-dollar behemoth potentially surpassing tech giants like Alphabet and Amazon. With competitors scrambling to catch up, NVIDIA appears poised to extend its lead even further with the new Blackwell B200 GPU and its companion GB200 “Superchip.”
The Blackwell B200 GPU boasts an astonishing 208 billion transistors, capable of delivering up to 20 petaflops of FP4 horsepower. However, the real game-changer lies in the GB200 “Superchip,” which combines two of these GPUs with a single Grace CPU. According to NVIDIA, this combination offers a staggering 30 times the performance for large language model (LLM) inference workloads while potentially reducing cost and energy consumption by up to 25 times compared to the H100.
To put this into perspective, training a massive 1.8 trillion parameter model would have previously required 8,000 Hopper GPUs and an astonishing 15 megawatts of power. Yet, with the Blackwell GPUs, NVIDIA claims that just 2,000 of these new chips can accomplish the same feat while consuming a mere four megawatts of power.
On a GPT-3 LLM benchmark with 175 billion parameters, NVIDIA says the GB200 delivers a remarkable seven times the performance of an H100, while also offering four times the training speed.

The secret behind the Blackwell B200’s prowess lies in two key innovations. First, a second-generation transformer engine that doubles the compute, bandwidth, and model size by using four bits for each neuron instead of eight, enabling the 20 petaflops of FP4 performance. Second, a next-gen NVLink switch that allows an unprecedented 576 GPUs to communicate with one another, with a bidirectional bandwidth of 1.8 terabytes per second. This required NVIDIA to develop an entirely new network switch chip with a staggering 50 billion transistors and its own onboard compute capabilities, boasting 3.6 teraflops of FP8 performance.

NVIDIA’s engineers say that previously, a cluster of just 16 GPUs would spend 60 percent of their time communicating with one another and only 40 percent actually computing. The new architecture aims to address this bottleneck, enabling more efficient and faster AI computations.
NVIDIA is betting big on companies adopting these new GPUs en masse, and to facilitate this, they are packaging them into larger designs like the GB200 NVL72. This liquid-cooled rack plugs in 36 CPUs and 72 GPUs, delivering a staggering 720 petaflops of AI training performance or an incredible 1.4 exaflops of inference performance. With nearly two miles of cables inside and 5,000 individual cables, the NVL72 is a testament to NVIDIA’s engineering prowess.

Each tray in the rack contains either two GB200 chips or two NVLink switches, with 18 of the former and nine of the latter per rack. According to NVIDIA, a single NVL72 rack can support models with up to 27 trillion parameters, dwarfing the rumored 1.7-trillion parameter count of GPT-4.
NVIDIA’s ambitions don’t stop there. The company is offering a complete solution for enterprises, including the DGX Superpod for DGX GB200. This behemoth combines eight systems into one, boasting 288 CPUs, 576 GPUs, 240TB of memory, and an eye-watering 11.5 exaflops of FP4 computing power.

NVIDIA claims that its systems can scale to tens of thousands of GB200 Superchips, connected together with 800Gbps networking using its new Quantum-X800 InfiniBand (for up to 144 connections) or Spectrum-X800 ethernet (for up to 64 connections).
While no new gaming GPUs were announced at this event, which is typically focused on GPU computing and AI, the Blackwell GPU architecture is expected to power a future RTX 50-series lineup of desktop graphics cards.
Discover more from GadgetBond
Subscribe to get the latest posts sent to your email.
