Meta testing its first RISC-V AI training chip

Imagine you’re Meta, a tech giant juggling billions of users, endless streams of data, and a relentless push into artificial intelligence. You’ve got a problem: training those massive AI models that power everything from Instagram recommendations to chatbot banter is expensive. And right now, NVIDIA’s got the keys to that kingdom with its high-end GPUs. But what if you could build your own chip—something tailored just for you, cheaper, and free from someone else’s supply chain? That’s exactly what Meta’s been cooking up, and according to a fresh scoop from Reuters, they’ve just taken a big step forward with their first-ever RISC-V-based AI training chip.

Let’s rewind a bit. Meta’s no stranger to custom silicon. A few years back, they rolled out their first RISC-V chips designed for AI inference—basically, the part where an AI model takes what it’s learned and applies it to real-world tasks, like figuring out which ad you’re most likely to click. Those chips, part of the Meta Training and Inference Accelerator (MTIA) program, were all about cutting costs and loosening NVIDIA’s grip on their data centers. It worked well enough that Meta’s kept at it, and now they’re aiming higher: a chip built from scratch not just for inference, but for training those beastly large-language models (think ChatGPT-sized stuff) that need serious horsepower.

Here’s the juicy bit—Reuters says Meta, likely with some help from chip design heavyweight Broadcom, has “taped out” this new AI training accelerator with TSMC, the Taiwanese foundry that makes chips for everyone from Apple to AMD. In chip-speak, “tape out” means they’ve finalized the design and produced the first working samples. Even better? They’ve powered it up, and it’s humming along. Right now, Meta’s playing it cautious, testing the waters with a limited deployment to see if this thing can deliver the goods. Are they running benchmarks? Training real models? We don’t know yet, but the fact that it’s in the wild is a big deal.

So, what’s under the hood? Details are scarce—Meta’s keeping the spec sheet close to the chest—but we can make some educated guesses. AI training chips are all about crunching massive datasets, so this thing’s likely built around a systolic array, a grid of tiny processing units that chew through matrix math like it’s candy. Think of it as a conveyor belt of number-crunching robots, passing data along in perfect sync. And since memory is king for training, it’s a safe bet this chip’s packing HBM3 or maybe even HBM3E—super-fast, high-bandwidth memory that’s become the gold standard for AI workloads. Performance-wise, it’s got to at least hold a candle to NVIDIA’s latest and greatest, like the H200 or B200 GPUs, while sipping less power. Otherwise, why bother?

This isn’t Meta’s first rodeo with custom chips, though. The MTIA program’s had its share of bumps. Back in 2022, they hit a wall with an earlier inference chip that couldn’t keep up—too slow, too power-hungry. Instead of doubling down, Meta pivoted, snapping up tens of thousands of NVIDIA GPUs to keep the AI train rolling. Those GPUs have been workhorses, training models for everything from Facebook’s ad engine to the open-source Llama models that researchers love. They’ve also powered inference for Meta’s 3 billion daily users, making sure your feed stays fresh. But even as NVIDIA’s chips kept the lights on, Meta never gave up on its DIY dreams.

Fast forward to last year, and Meta finally got an MTIA inference chip into production. It’s been quietly doing its job, and now, with this new training chip in testing, the company’s eyeing a bigger prize. The plan, according to Meta’s brass, is to start leaning on these custom chips for AI training by 2026—assuming they pass muster. If all goes well, they’ll scale up, weaving this homegrown tech deeper into their data centers. It’s a slow burn, but the payoff could be huge: lower costs, more control, and a tighter grip on their AI destiny.

Here’s where it gets nerdy-cool. Meta’s inference chips run on RISC-V, an open-source instruction set architecture that’s like the Linux of chip design—free, flexible, and royalty-free. No paying Arm or Intel for permission to tinker. If this new training chip follows suit (and Reuters doesn’t say either way), Meta might’ve just built one of the beefiest RISC-V chips ever. That’s a flex—not just for Meta, but for the whole RISC-V movement, which has been picking up steam as companies look for alternatives to the old guard.

Why does this matter? Well, Meta’s not alone in this game. Google’s got its TPUs, Amazon’s got Trainium, and everyone’s trying to break free from NVIDIA’s dominance. NVIDIA’s GPUs are incredible—don’t get me wrong—but they’re pricey, and supply can be tight. Meta’s move could shake things up, especially if they nail the performance-per-watt metric that’s so critical for data center economics. Plus, with AI eating up more power than ever (training a single model can burn through energy like a small city), efficiency isn’t just a buzzword—it’s a necessity.

Still, there’s a long road ahead. Testing a chip is one thing; scaling it to handle Meta’s insane workload is another. NVIDIA’s not sitting still either—its next-gen B300 GPUs are lurking on the horizon, ready to raise the bar again. And Meta’s past stumbles prove this isn’t a sure thing. But if they pull it off, this RISC-V experiment could mark a turning point—not just for Meta, but for how Big Tech builds the brains behind tomorrow’s AI.