Twenty years ago, Google started one of its first machine learning experiments with a pretty simple goal: turn the science of language into the magic of human connection. That experiment became Google Translate, and today, the company translates over a trillion words every month for billions of users across its products. But on June 9, 2026, Google announced it’s taking that experiment to the next level with Gemini 3.5 Live Translate, its latest audio model designed for live, real-time speech-to-speech translation.
What makes this different from everything we’ve seen before? Well, for starters, it actually works at the speed of human conversation. Unlike the turn-by-turn translation systems that wait for you to finish speaking before responding, Gemini 3.5 Live Translate generates speech continuously. It stays just a few seconds behind the speaker throughout the entire session, delivering fluid audio without awkward pauses. Google says it’s fast enough to keep up with a normal conversation, which is a pretty bold claim when you think about how most translation tools still feel like they’re running through a dial-up connection.
The model automatically detects more than 70 languages and generates smooth, natural-sounding translated speech that preserves the speaker’s intonation, pacing, and pitch. This is a big deal because so many AI voice models sound robotic or flat, stripping away the personality and emotion from what someone’s actually saying. But Gemini 3.5 Live Translate keeps those human qualities intact, which makes the translation feel less like you’re talking to a machine and more like you’re having a real conversation with someone who just happens to speak a different language.
If you’re wondering how this actually works under the hood, the model processes speech as it’s streamed, enabling a more seamless connection across languages. It handles multilingual inputs without requiring you to manually configure any settings, and its noise robustness means it can function in loud, unpredictable environments. So if you’re trying to translate a conversation at a busy airport or a crowded street market, it won’t just break down because of background noise.
Google is rolling out Gemini 3.5 Live Translate across three different surfaces starting today. Developers get it in public preview through the Gemini Live API and Google AI Studio, which means they can start building voice translation apps into their own platforms right now. Enterprises are getting a private preview in Google Meet starting this month, and everyone else can use it through the Google Translate app on both Android and iOS.
For developers, the integration is pretty straightforward. The Gemini Live API supports low-latency, real-time speech-to-speech translation between 70+ languages using the gemini-3.5-live-translate-preview model. By configuring the API with translation settings, you can stream audio in one language and receive translated audio output in another, enabling seamless real-time voice-to-voice translation. Developer platforms like Agora, Fishjam, LiveKit, Pipecat, and Vision Agents are already integrating the technology to enable voice translation applications, which means they’re handling the complex real-time media streaming infrastructure so developers can focus on the user experience.
One of the companies already testing this is Grab, the Southeast Asian tech giant. They’re using Gemini 3.5 Live Translate to enable multilingual communication in near real-time between drivers and travelers at pickups. These users make over 10 million voice calls per month through Grab, so having a translation tool that actually works in real-time is going to be a huge improvement for their service. Philipp Kandal, Grab’s Chief Product Officer, said they’ve valued the model’s ability to auto-detect multiple languages and translate speech accurately with low latency.
In Google Meet, speech translation is going to get a major upgrade. The previous limit was just five languages, but with Gemini 3.5 Live Translate, it’s expanding to over 70 languages. That means conversations across more than 2,000 language combinations in one meeting, which is a massive jump from the previous state of only translating to and from English. The interface is also getting updated to provide instant access to speech translation, so you won’t have to dig through settings menus to find it. Google is launching this in private preview for select business Google Workspace customers starting this month, followed by a broader rollout later in the year.
For regular users on the Google Translate app, the experience is pretty slick. When using the Live translate feature, you just connect any pair of headphones and experience more seamless translation that mirrors the speaker’s tone across 70+ languages. Android users are also getting a new “listening mode” that lets you hear translations directly through your phone’s earpiece. You hold your phone to your ear like a regular call, and the translated audio streams straight to you. This is helpful when you want to quickly hear translations without others hearing and you don’t have headphones handy.
There’s also a safety consideration here that Google is being upfront about. All audio generated by Gemini 3.5 Live Translate is watermarked with SynthID, an imperceptible watermark woven directly into the audio output. This ensures AI-generated content remains detectable to help prevent misinformation, which is becoming increasingly important as AI voice technology gets more advanced.
What’s really interesting about this release is how it represents a shift in what we expect from translation technology. For years, we’ve been stuck with tools that work, but they don’t feel natural. There’s always a delay, a robotic quality, or a sense that you’re not actually having a conversation but rather exchanging pre-programmed messages. Gemini 3.5 Live Translate is trying to close that gap, and based on the early feedback from companies like Grab, CJ ENM, and LiveKit, it’s actually succeeding.
Tech journalists and developers who’ve gotten early access have shared positive feedback highlighting the impressive translation quality, accuracy, and low latency. The model’s ability to auto-detect languages without manual configuration is a standout feature, and the continuous stream translation approach means it doesn’t have to wait until one person has finished speaking before it starts generating a response.
Looking at the bigger picture, this is part of Google’s broader push into AI audio models. Earlier this year, the company introduced Gemini Omni and other 3.5 models that showed off computer use capabilities and advanced audio processing. Gemini 3.5 Live Translate fits into that ecosystem as a specialized tool for one of the most practical applications of AI: making it possible for people who speak different languages to communicate naturally.
The timing is also significant. As global communication becomes more important in business, travel, and everyday life, the ability to translate conversations in real-time is becoming less of a luxury and more of a necessity. Whether you’re a developer building a communication app, a business running international meetings, or just someone trying to navigate a foreign country, this kind of technology has real-world value.
Discover more from GadgetBond
Subscribe to get the latest posts sent to your email.
