DeepMind’s Gemini Robotics-ER 1.6 pushes embodied AI into the real world

DeepMind is rolling out a new kind of robot brain today, and it’s aimed squarely at the messiness of the real world rather than a clean lab demo. Gemini Robotics-ER 1.6 is the latest “embodied reasoning” model from Google DeepMind, designed to help robots not just see their surroundings, but actually understand what they’re looking at and decide what to do next.

In practical terms, this is the model that sits on top of a robot’s cameras and sensors and acts like a high-level planner. It can parse multiple video feeds, figure out where things are, plan a sequence of actions, call tools like Google Search or other vision-language-action systems, and then tell the robot what to try next. DeepMind describes it as a “reasoning-first” model for the physical world, with a focus on three pillars: visual and spatial understanding, task planning, and knowing when a task has actually succeeded.

The headline upgrades over the previous Robotics-ER 1.5 and the general-purpose Gemini 3.0 Flash models are all about that spatial and physical reasoning layer. According to DeepMind’s internal benchmarks, 1.6 is noticeably better at things like pointing precisely to objects, counting items, and determining whether a job is finished based on what the cameras see. It’s also unlocking a new capability that sounds niche but is surprisingly important in industry: reading instruments like analog gauges and sight glasses with high accuracy, even when the camera view is imperfect.

Pointing might sound trivial, but for robots it’s the foundation for almost everything else. If a model can’t reliably say “this is the blue cup” or “these are all the pliers,” any downstream motion planning will be shaky. Robotics-ER 1.6 uses points as intermediate reasoning steps: it can mark the location of objects, use those points to count, or identify “salient points” on an image to help with metric estimates like distances or proportions. DeepMind shows a simple, very human example: a cluttered tool bench with hammers, scissors, paintbrushes, pliers and garden tools. 1.6 manages to correctly count and point to each requested category, and—crucially—does not invent items that aren’t there, like a wheelbarrow or a specific drill brand that was mentioned in the prompt. Earlier models either miscounted or hallucinated objects.

That ability not to hallucinate visually is a quiet but big deal. A lot of modern vision-language models will confidently label or count things that simply don’t exist in the image if you nudge them in that direction. For a chatbot, that’s annoying. For a robot working around humans or heavy equipment, that’s a safety risk. Gemini Robotics-ER 1.6 appears much stricter here: if the requested object is absent, it just doesn’t point.

The second major piece is “success detection” — essentially teaching a robot to know when it can stop. In real environments, tasks rarely play out exactly like a textbook example. Objects move, lighting changes, camera views are partially blocked, and the robot itself may be juggling multiple camera angles, like an overhead view plus a wrist‑mounted camera on its arm. With 1.6, DeepMind has pushed multi-view reasoning forward, so the model can fuse multiple camera streams over time and decide whether a task like “put the blue pen into the black pen holder” has actually been completed. That’s the difference between a robot endlessly fiddling with a pen it already placed correctly, and one that can confidently move on to the next step in a multi-stage plan.

Where things really start to look like a bridge to industrial deployments is the new instrument-reading capability. DeepMind developed this in close collaboration with Boston Dynamics, which has been using its Spot robot dog for industrial inspections—think factory floors, power plants, and construction sites where a human would otherwise walk around with a handheld camera, clipboard, or tablet. Spot can already do autonomous inspection runs and capture photos and data from all over a facility; the missing piece has been turning those images into reliable measurements without a human looking at every frame.

Gemini Robotics-ER 1.6 is meant to sit on top of that pipeline and interpret everything from circular pressure gauges to vertical level indicators and digital displays. Reading an analog gauge sounds simple—until you consider lens distortion, odd angles, small tick marks, labels with different units, and occasionally multiple needles that map to different decimal places. Sight glasses add another headache: you have to estimate fill levels from a camera perspective that might distort the perceived liquid boundary. DeepMind says 1.6 uses a combination of zooming, precise pointing and code execution to handle this, a technique they call “agentic vision” that first appeared with Gemini 3.

With agentic vision enabled, the model can autonomously crop into a gauge, zoom in to read fine details, then use simple code to estimate proportions and intervals, essentially turning the image into a more structured measurement problem. DeepMind’s internal numbers show a jump in instrument-reading success from 23% for Robotics-ER 1.5 to 86% for 1.6, and up to 93% when agentic vision is switched on. That’s the kind of accuracy that starts to be genuinely useful for routine industrial inspection, especially when combined with Spot’s growing role as a standard inspection platform in sectors like energy, manufacturing, and mining.

Boston Dynamics is clearly leaning into this. The company has spent the last few years positioning Spot as an “agile mobile sensor platform” that can roam high-risk or hard-to-reach areas, capture data, and feed it into monitoring systems like Orbit, its management layer for robot fleets and inspection routes. With something like Gemini Robotics-ER 1.6 reading gauges and spotting anomalies, you can imagine a near-future workflow where a human engineer spends far less time walking the plant and far more time responding to data-driven alerts and trends.

All of this power comes with an obvious question: how safe is it to hand more autonomy to AI-driven robots? DeepMind’s answer is that 1.6 is its “safest robotics model yet,” and they back that up with tests against the ASIMOV safety benchmark, which was designed specifically to probe how foundation models behave as robot brains in risky situations. Earlier work on Robotics-ER 1.5 already focused heavily on two things: refusing harmful plans (semantic safety), and respecting physical constraints like payload limits or “don’t handle liquids” instructions. With 1.6, those safety behaviors improve further, especially when it comes to physical constraint awareness via spatial outputs like pointing.

In practice, this means the model is better at answers like “don’t pick up that object, it looks too heavy for this gripper” or “avoid interacting with that container, it appears to hold liquid,” rather than blindly following a user instruction. DeepMind also evaluated 1.6 on text and video scenarios derived from real-world injury reports, and reports about a 6% improvement in text-based risk perception and 10% in video over a baseline Gemini 3.0 Flash setup. It’s not a formal guarantee of safety, but the direction is clear: the models that power physical agents are being tuned specifically to spot trouble before it happens.

As with most modern AI launches, developers don’t have to wait long to play with this. Gemini Robotics-ER 1.6 is available starting today through the Gemini API and Google AI Studio, with a dedicated robotics overview and a Colab notebook that walks through configuration and prompting for embodied reasoning tasks. That makes it accessible not just to big robotics labs, but to smaller teams experimenting with robot arms, mobile bases, and custom hardware that need a smarter perception and planning layer on top.

DeepMind also seems keen to make this a two-way street with the robotics community. If the model falls short on a particular specialized use case, the company is inviting partners to submit 10–50 labeled images that highlight specific failure modes, which can then be used to harden the model’s reasoning for future releases. It’s a fairly lightweight feedback loop, but in a space where edge cases are endless—every facility or warehouse looks different—that kind of targeted data could matter.

Zooming out, Gemini Robotics-ER 1.6 fits into a broader trend: turning large multimodal models into “generalist” robot brains that can transfer knowledge across embodiments, tools, and environments. The previous Robotics-ER 1.5 already demonstrated state-of-the-art performance on a wide range of embodied reasoning benchmarks and agentic capabilities like breaking down long-horizon tasks and orchestrating tool use. The 1.6 upgrade isn’t about splashy new tricks so much as tightening the screws on the pieces that matter in the field: precise spatial reasoning, multi‑view understanding, instrument reading, and safety.

If you’re in robotics or industrial automation, the significance is straightforward: we’re inching closer to robots that can not only fetch and carry, but also patrol, inspect, and make first-line judgments about the health and safety of complex facilities without constant human supervision. For everyone else, you might not notice Gemini Robotics-ER 1.6 directly—but the next time a robot dog is quietly walking a refinery at night, reading gauges and listening for anomalies so a human doesn’t have to, there’s a good chance something like this model is doing the thinking behind the scenes.