Gemini 3.5 Flash adds native computer use capability

Computer use is now a built-in tool in Gemini 3.5 Flash, marking a major shift for developers building AI agents that can actually interact with software the way humans do.

For years, the dream of AI agents that could click buttons, fill forms, and navigate applications independently felt just out of reach. You’d explain what you wanted in plain language, and the AI would understand—but then it would hit a wall. It couldn’t do anything on your screen. It couldn’t open a browser, log into a website, or type data into a spreadsheet. That limitation is finally gone. Google just integrated computer use directly into Gemini 3.5 Flash, the company’s fastest and most capable agent-focused model, and it’s delivering the best performance the company has ever shown for agentic computer use tasks.

This isn’t just an incremental update. It’s a fundamental change in how AI agents operate. Before this announcement, computer use was only available as a standalone Gemini 2.5 Computer Use model, a specialized version built on Gemini 2.5 Pro that required developers to use it separately from the main Flash model. Now, it’s integrated natively into Gemini 3.5 Flash itself. Developers don’t need to switch between models or manage separate APIs. They can use one model that excels at everything: function calling, Search and Maps grounding, and now, actual computer interaction.

What makes computer use in Gemini 3.5 Flash so powerful is what the model can actually do. It can see screens, understand UI layouts, read on-screen content, and then take action. Click buttons. Enter text. Navigate between applications. Execute multi-step workflows autonomously. All of this happens without requiring direct backend integrations or API access to the software itself. The agent interacts with digital interfaces the way a human would—through visual observation and physical action.

Think about what this unlocks for enterprise automation. Continuous software testing becomes dramatically easier. Instead of writing hundreds of lines of code to test every button and form in an application, you can build an agent that simply watches the interface and verifies everything works. Knowledge work across professional applications—like extracting data from one system and entering it into another—becomes something an AI agent can handle end-to-end. Long-horizon tasks that require persistence and adaptability, the kind that previously would trip up most AI systems, now have a model designed to handle them reliably.

The performance numbers are striking. In UI control benchmarks measuring agentic computer use on OSWorld-Verified tasks, Gemini 3.5 Flash with computer use hits 78.7% accuracy, outperforming earlier versions significantly. The company says this delivers their best performance yet for agentic computer use tasks, and given how quickly this field has moved, that’s a meaningful claim.

Safety has been a major concern as AI agents have gotten more capable. When an agent can click buttons and enter data in live environments, the risks of prompt injection attacks or unintended actions become real. Google is taking a “defense-in-depth” approach here. They’ve used targeted adversarial training specifically for computer use in Gemini 3.5 Flash to mitigate prompt injection risks. They’re also releasing two optional enterprise safeguard systems: one that requires explicit user confirmation for sensitive or irreversible actions, and another that automatically stops tasks if indirect prompt injection is identified.

The safeguards are optional but smart. The safety service in Gemini 3.5 Flash automatically determines whether user confirmation is required based on the action’s risk level. For high-stakes tasks—like making purchases, logging into accounts, or changing critical settings—the agent will pause and ask before proceeding. For routine tasks, it moves forward without unnecessary friction. This balance between safety and usability is something developers have been asking for, and Google’s implementation seems to address it thoughtfully.

Developers can start using computer use in Gemini 3.5 Flash right now through the Gemini API and the Gemini Enterprise Agent Platform. Google’s hosting a demo environment through Browserbase where you can test the capabilities before building anything real. There’s also a reference implementation on GitHub with documentation to help you get started. The integration is public preview, so you’re not waiting on beta access or enterprise deals. If you have API access to Gemini 3.5 Flash, you have computer use.

What’s interesting about this rollout is how it fits into the broader picture of what Gemini has become. At Google I/O 2026, Google introduced Gemini 3.5 as their most capable family yet, with Flash positioned as the speed-focused model for real-time applications. Adding computer use to Flash means that speed doesn’t come at the cost of capability anymore. You get the fast response times Flash is known for, plus the ability to actually interact with software. That combination is rare in the AI agent space.

The technology behind computer use isn’t new in isolation. Google’s been working on this for over a year. The Gemini 2.5 Computer Use model launched in October 2025 as a preview, and early users reported it operated up to 50% faster than competing solutions while outperforming others on web and mobile control benchmarks. What’s new here is the integration. Instead of a specialized model that required separate access, computer use is now a native capability of the main Flash model. Developers get the same visual understanding and reasoning capabilities that powered the 2.5 Computer Use model, but within the faster, more cost-effective Flash architecture.

This also matters because of what it means for the agent ecosystem. AI agents have been getting smarter at reasoning and planning, but they’ve been stuck at the execution phase. They could figure out what needed to happen, but they couldn’t make it happen without human intervention or pre-built API integrations. Computer use closes that gap. Agents can now bridge the gap between planning and action, operating across browser, mobile, and desktop environments without needing every software provider to build agent-specific APIs first.

The timing is also notable. We’re in mid-2026, and the AI agent market is finally moving past the hype phase into actual deployment. Companies are building agents for real work: customer support automation, data extraction, software testing, workflow automation. Many of these use cases have been limited by what the agents could actually do. Computer use in Gemini 3.5 Flash removes one of the biggest constraints. It’s not solving every problem—agents still struggle with ambiguous tasks, complex error handling, and situations requiring human judgment—but it solves the basic problem of interface interaction.

For developers, the practical impact is straightforward. You can build agents that do things without needing to know every API endpoint. If a human can use a piece of software, an agent built with Gemini 3.5 Flash’s computer use can probably use it too. That’s a massive expansion of what’s possible. You’re not limited to the software that has agent support. You’re limited only by whether the interface is visible and clickable.

The demo environment Google’s hosting is a good place to start if you’re curious. Browserbase’s setup lets you test computer use in a controlled environment before you commit to building anything production-ready. You can see how the agent handles navigation, form filling, and multi-step tasks. You can watch it reason through interface changes and adapt when things don’t work as expected. That’s the kind of hands-on experience that makes the technology real, rather than just reading about what it can do.

What’s next for computer use is probably more integration. If history tells us anything about AI capabilities, it’s that once a feature works well in one model, it spreads. Computer use is already in Gemini 3.5 Flash. It might show up in other Flash variants. It could expand to Pro models. The underlying technology—the visual understanding, the reasoning about UI states, the action selection—will keep improving as the models get better. And as it improves, the range of tasks agents can handle will expand too.

There’s also the question of what this means for software design itself. If agents are going to interact with interfaces the way humans do, does that change how we build software? Do we need to think more about agent accessibility? Do we need interfaces that are easier for agents to understand? These are questions that haven’t been answered yet, but they’re worth thinking about. The technology is moving forward, and the ecosystem will have to adapt.

For now, the headline is clear: computer use is built into Gemini 3.5 Flash, and it works. Developers can start building agents that see, reason, and take action across real software environments. The safety features are there if you need them. The performance is better than anything Google has shown before. And the integration means you don’t have to manage multiple models or APIs to get it working.

The future of AI agents has been waiting for this moment. Now it’s here.