Google spent much of the past year stuffing new powers into Gemini — its multimodal AI stack — and on Monday it added one of the most practical: you can now drop audio files into the Gemini app and ask the model to listen, transcribe, summarize, and pull out the bits that matter. It’s the kind of quality-of-life change that looks small on a keynote slide but can transform how people turn conversations into work.
What’s new
- The Gemini app now accepts uploaded audio files across web and mobile.
- Free users can upload up to 10 minutes of audio and get five prompts per day; paid tiers (AI Pro / AI Ultra) increase that allowance — Google lists uploads of up to three hours for paid subscribers. All users can attach up to 10 files per prompt, including ZIP archives.
- At the same time, Google expanded Search (AI Mode) to five additional languages — Hindi, Indonesian, Japanese, Korean and Brazilian Portuguese — thanks to the integration of Gemini 2.5.
- NotebookLM, Google’s document-driven research tool, got new report styles — study guides, blog-post formats, briefing docs, and even flashcards and quizzes — in over 80 languages. The feature will let you shape tone and structure and generate study/teaching materials from uploaded files.
Why this matters
If you’ve ever ended a meeting with a tangle of voice notes, a stack of lecture recordings, or interview files and promised yourself you’ll “go back and summarize,” this removes a friction point. Instead of exporting audio, firing up a separate transcription service, and pasting a transcript into a prompt, you can hand Gemini the file and ask it to do everything in one place: identify speakers, summarize action items, produce a short excerpt for social, or generate study flashcards. That’s useful for students, podcasters, reporters, researchers, and busy product teams.
The details (limits, formats, and the fine print)
Google’s official help pages and coverage of the rollout confirm the tiered limits: free accounts have modest caps (10 minutes total audio, five daily prompts), while AI Pro and AI Ultra subscribers get significantly more headroom — up to three hours of audio uploads and priority access to features. The app supports multiple audio file formats and accepts up to 10 files in a single prompt (including ZIPs), which makes batch uploads possible.
Android Central and other hands-on pieces add practical color: the app transcribes and highlights key moments, and surfaces speaker separation and actionable insights — the things people actually want from audio analysis. That’s also why Google’s Josh Woodward (VP of Google Labs and Gemini) framed audio support as the “#1 request.”
What Google shipped alongside audio
This isn’t an isolated tweak. The audio update landed in a larger bundle:
- Search (AI Mode) — Now speaks five more languages (Hindi, Indonesian, Japanese, Korean, Brazilian Portuguese), widening access to Gemini-powered, conversational search in non-English markets. That’s a notable step for people who prefer to ask complex, follow-up questions in their native tongues.
- NotebookLM — Think of it as a “research assistant that writes” — you can now ask NotebookLM to output finished artifacts (blog posts, study guides, briefings, quizzes, flashcards) from your uploads and set tone/structure. Google says the expanded report tools will appear in over 80 languages. That makes NotebookLM less of a note-taking toy and more of a content production/workflow tool.
A quick note on privacy and memory
This summer, Google also pushed Gemini toward more persistent memory — letting the assistant remember preferences and details from prior conversations by default in some versions — and introduced controls (opt-outs, temporary chats) so users could limit what’s stored. If you plan to upload sensitive interviews or private meetings, be sure to check your Gemini app’s Personal Context / Keep Activity settings before you hand over recordings. Google has positioned those controls as the trade-off for a more personal assistant; the defaults and opt-outs matter.
What this means for competing workflows
Tools that previously sat between audio and action — standalone transcribers, separate summarizers, or manual workflows — now compete with an integrated path inside Google’s ecosystem. That’s convenient, but it also nudges more content into Google’s services (where it can be used to improve models, depending on your settings). For organizations that need strict chain-of-custody or certified transcripts, dedicated providers will still be the safer bet; for fast summaries, ideation, and classroom use, this is a major timesaver.
Takeaway
This update is less about headline-grabbing new model benchmarks and more about smoothing a real workflow: audio → insight. By folding audio into the same Gemini canvas you use for text and images, Google is nudging the assistant toward being an honest-to-God everyday tool — not just a research demo. The trade-offs are familiar: convenience versus control. If you’re curious, try a short audio clip first (free users are intentionally capped at 10 minutes), poke the app’s privacy settings, and see whether Gemini’s summaries match what you need.
Discover more from GadgetBond
Subscribe to get the latest posts sent to your email.
