Google’s Gemini Live — the real-time, talking version of Gemini that you can have conversations with while sharing pictures, video, or your screen — is getting a practical, slightly sci-fi upgrade: it will now point to things for you. Not with words alone, but by literally highlighting items on your phone’s display while you share your camera or screen, so the assistant can show as well as tell.
Think of it as a GPS arrow for the physical world. If you hold your phone up to a messy toolbox and ask which screwdriver fits a particular screw, Gemini Live can now draw attention to the correct tool on your live camera feed. If you’re comparing two coats you’re holding up to the camera, it can mark the one that best matches your “warm, water-resistant” criteria. Google says the new “visual guidance” is meant to help faster and less ambiguous — especially for tasks that live in a mix of the digital and physical world.
The visual-guidance feature will debut on Google’s new Pixel 10 phones at launch (Google’s hardware event confirmed the Pixel 10 release for August 28, 2025), and Google says it will begin rolling the capability out to other Android devices “at the same time,” with iPhone users getting access in the “coming weeks.” That staged rollout is typical for features that tie closely to device sensors and on-device AI.
The changes aren’t only visual. Google is expanding Gemini Live’s ability to interact with other apps on your phone. The assistant will be able to take actions like drafting texts in Messages, placing calls via Phone, or scheduling things in Clock and Calendar during an ongoing live conversation. Importantly, Gemini Live keeps the conversation “live” in the literal sense: you can interrupt it with a real request — for example, while it’s describing a route, you can say, “This route looks good. Now, send a message to Alex that I’m running about 10 minutes late,” and Gemini will draft or send that message for you. That smoother handoff between conversation and action is the kind of UX Google hopes will make conversational AI feel less like a Q&A bot and more like a helpful companion.
Google is also rolling out an updated audio model for Gemini Live that it says better captures the “key elements of human speech” — intonation, rhythm, pitch — so the assistant’s voice sounds more natural and expressive. You’ll be able to tweak speaking speed, and the assistant may adopt different tones or even accents for storytelling or role-based narration. The goal is less robotic recitation and more expressive speech that matches the context of what you’re asking about.
This feels like a pragmatic move. People already use their phones to get help with hands-on tasks — fixing things, cooking, shopping, or comparing objects. Adding a simple visual overlay turns an otherwise messy descriptive exchange (“Which one is the bigger bolt?”) into a single, glanceable interaction. For field work, remote troubleshooting, or accessibility scenarios where a sighted helper needs to point something out for someone else, the feature could be genuinely useful.
As useful as it sounds, visual guidance raises the expected set of concerns. First, it requires you to actively share your camera or screen during a live conversation, which is an important privacy switch — you must opt in. Second, computer vision isn’t perfect: highlights could be wrong, distract you, or offer false confidence in safety-critical situations (imagine a misidentified wire in a DIY electrical job). Finally, device compatibility and latency will matter: a smooth experience depends on sensor quality, network or on-device processing, and how quickly Gemini can analyze the frame and draw an overlay. Google’s posts and demos emphasize opt-in controls and a staged rollout, but real-world testing will tell how well those promises pan out.
Discover more from GadgetBond
Subscribe to get the latest posts sent to your email.
