Google launches Gemini Omni across Flow creative tools

Google just pulled back the curtain on something that could fundamentally change how creators make stuff. At its I/O developer conference on May 19, the company introduced Gemini Omni, a new AI model that’s now powering its Flow creative tools – and it’s not just an incremental update. This is Google’s attempt to collapse the entire messy stack of creative AI tools into a single, conversational experience that feels less like software and more like working with a creative partner who actually gets what you’re trying to do.

For anyone who’s been watching Google’s creative AI journey, this moment has been building for a while. Flow started life as a filmmaking tool aimed at professional directors and cinematographers when it debuted at last year’s I/O. But in the year since, it’s morphed into something much bigger – a full-blown AI creative studio that handles video, images, and now music through Flow Music, which launched earlier this year. The company says millions of creators across different disciplines have used these tools to bring projects to life, and now Google is betting that Gemini Omni will push that number even higher.

So what exactly is Gemini Omni? Think of it as Google’s answer to the fundamental problem with most creative AI tools today – they’re compartmentalized. You’ve got one model for text-to-image, another for image-to-video, something else for audio generation, and they all operate in their own silos. Omni throws that whole paradigm out the window. It’s what Google calls an “any-to-any” model, meaning it can take any combination of text, images, audio, and video as input and produce high-quality output across those same formats – all from a single unified model. The company describes it as “natively multimodal from the ground up,” which isn’t just marketing speak. It means the model can reason across different types of media in the same forward pass, resulting in more coherent edits and fewer weird artifacts that happen when you’re stitching together outputs from multiple specialized systems.

The first version rolling out is called Gemini Omni Flash, and it’s available now inside Flow for Google AI subscribers globally. Google positions it as the video equivalent of Nano Banana, the image generation and editing model the company shipped about a year ago. But the really interesting part isn’t just what it generates – it’s how you work with it. Omni Flash enables conversational video editing, where each instruction you give builds on the last one and past directions persist across your conversation. So you can say something like “make the lights dim,” then follow up with “add rain to the scene,” and the model remembers the context from your earlier edits. It’s iterative creation through dialogue rather than wrestling with timelines and effects panels.

This conversational approach extends to some genuinely useful capabilities. Omni Flash improves character consistency, so identity and voice stay intact across different scenes – something that’s been a persistent headache with AI-generated video. You can blend real-world footage you shot on your phone with AI-generated content, swap out characters using reference images, change camera angles, and even sync text and sound to the action in your video. The model also has what Google calls “world understanding,” meaning it can ground generated videos in real-world knowledge thanks to Gemini’s underlying intelligence.

But Gemini Omni is just one piece of what Google announced for Flow. The other big addition is Flow Agent, which the company describes as a creative partner that can plan and reason through complex tasks while keeping you in control. This isn’t just a chatbot that answers questions – it’s built with Gemini models and brings what Google calls “expertise and a deep understanding of your project” to help with everything from early brainstorming to creating and editing.

Here’s where it gets practical. Say you’re working on a film project and need help with dialogue between characters in a specific scene. The agent can act as a sounding board, offering suggestions and even making plot recommendations. Once you’re deeper into production, it can create multiple variations of a scene simultaneously to give you more options, or handle batch editing so changes you make get reflected across all your assets. It can even organize your assets into collections and intuitively rename them – those mundane but time-consuming tasks that eat up hours in any creative workflow. Flow Agent is now available to all Flow users globally, not just paid subscribers.

Then there’s Flow Tools, which might be the most intriguing addition for power users. This feature lets you use natural language to create custom tools and workflows inside Flow – no coding experience required. Need a specific image editor, video resizer, or custom shader? You can build it yourself by describing what you want. And if you create something useful, you can share it with other Flow users who can then remix it into their own workflows. One early access partner, filmmaker László Gaal, created a tool called “pixelBento” that applies post-processing effects like lo-fi and glitch aesthetics. All Flow users can access and use existing Tools, while Google AI subscribers get the ability to create and remix them.

On the music side, Flow Music is getting its own set of upgrades powered by Gemini Omni and the company’s Lyria 3 Pro model. Lyria 3 Pro, which Google introduced earlier this year, is the company’s advanced music generation model that can create complete compositions up to three minutes long. Unlike shorter-form models, Lyria 3 Pro understands musical architecture – it knows what intros, verses, choruses, and bridges are supposed to do and how they fit together.

Flow Music now offers section-by-section editing, giving artists and producers granular control over their tracks. You can highlight any part of a song and make changes – instantly rewrite or translate lyrics, restyle the beat drop, or sample a specific section and extend it in a completely different direction – without touching the rest of your track. There’s also a new covers feature that lets you transform the style of full songs while keeping the original melody and structure. Want to hear your upbeat pop track as a lo-fi study version? You can do that now.

But perhaps the most interesting marriage of these tools is the ability to create music videos using Gemini Omni in Flow Music. You can work conversationally with the agent to direct shareable music videos, guiding the styles, subjects, and scenes to match the narrative and pacing of your track. It’s the kind of end-to-end creative workflow that would have required multiple apps, plugins, and rendering passes just a couple of years ago, now happening in a single interface through conversation.

Both Flow and Flow Music are also getting mobile apps, though with some important caveats. The web versions remain the go-to platforms for access to all capabilities and features, but the mobile apps provide flexibility for on-the-go creation. The Flow app is available on Android in beta (with iOS coming soon), while the Flow Music app launched on iOS (with Android on the way). You need to be 18 or older to use them.

All of this sits behind Google’s AI subscription tiers. Omni Flash and the full creative features require a Google AI subscription, with access varying by plan. In the US, that means AI Plus at $7.99 per month, AI Pro at $19.99 per month, or the newly announced AI Ultra at $249.99 per month. Google says Omni is currently available through the Gemini app and website, Flow, and YouTube Shorts, with API access through Vertex AI “coming in the coming weeks”. That gap matters – until the API is generally available, Omni remains primarily a consumer and prosumer tool rather than something enterprises can build into their own workflows.

There are also the usual concerns that come with any generative AI tool. Every video generated by Omni carries Google’s SynthID digital watermark, and the company is expanding C2PA Content Credentials across its generative tools. Google is also launching an AI Content Detection API on its Agent Platform that lets businesses identify AI-generated content from both Google and other popular models. It’s an acknowledgment that as these tools get more powerful and the outputs get more convincing, provenance and transparency become increasingly important.

What makes this whole package interesting isn’t just the individual features – it’s the philosophy behind them. Google is betting that the future of creative AI isn’t about giving people fifty different specialized tools to juggle. It’s about creating a unified environment where you can move fluidly between ideation, creation, and refinement through conversation and iteration. Whether you’re a filmmaker blocking out scenes, a musician experimenting with different arrangements, or someone just trying to make a decent video for social media, the barrier isn’t supposed to be technical knowledge – it’s imagination.

The company says Flow has been used to create over 1.5 billion images and videos for projects ranging from films to music videos since it launched. With Gemini Omni and these new agentic capabilities, Google is clearly trying to make that number grow exponentially. Whether creators embrace this conversational, AI-driven approach or whether they find it too limiting compared to traditional tools remains to be seen. But for now, Google is pushing hard on the idea that the best creative tool is one that feels less like software and more like a collaborator who speaks your language.