Gemma 4 lands in AICore to supercharge on‑device Android AI

Gemma 4 is officially landing on Android through the AICore Developer Preview, and it’s a big deal for anyone building AI-first apps that actually run on real phones, not just in the cloud. In simple terms, Google is giving developers early access to its newest open AI model family directly on-device, months before Gemini Nano 4 rolls out to users later this year.

With this preview, Google is treating Gemma 4 as the foundation for the next generation of Gemini Nano, so anything you prototype now is designed to carry forward to Gemini Nano 4–enabled devices without major rewrites. The idea is: build once against Gemma 4 in AICore, then benefit from extra optimizations and better performance when those same experiences ship to production across the wider Android ecosystem.

In AICore, Gemma 4 comes in two main “Edge” sizes: E2B and E4B, both tuned for phones and other lightweight hardware. E4B focuses on higher reasoning power and more complex tasks, while E2B is all about speed and low latency, running roughly three times faster than E4B with lower resource use. Under the hood, they’re part of a bigger Gemma 4 family (including larger 26B and 31B variants) that can scale from phones and Raspberry Pi boards all the way up to workstations and servers.

On Android, the pitch is straightforward: Gemma 4 is multimodal, so it can understand text, images, and (on smaller variants) audio, while staying efficient enough to run locally. That unlocks use cases like smarter on-device assistants, OCR-heavy workflows, chart and document understanding, handwriting recognition, and even more contextual in-app help that doesn’t need to round-trip to the cloud. Google says the new model is up to four times faster than previous on-device versions and can cut battery usage by up to about 60 percent, which matters if you want users to actually keep these features turned on.

Beyond raw speed, Gemma 4 is built to be more capable at reasoning, math, time-based logic, and structured outputs. That means you can lean on it for things like validating user-generated content against community guidelines, calculating savings or repayment plans, or scheduling reminders with fuzzy time instructions, without doing a ton of extra server-side logic. Google is also rolling out support for features like tool calling, system prompts, structured outputs, and a “thinking” mode in the Prompt API, so the model can plan before responding instead of just predicting the next token.

A big angle here is language coverage and reach: Gemma 4 supports over 140 languages, making it a strong fit for localized, multilingual apps that need to work well worldwide, even when connectivity is poor. Because it’s designed to run on the latest AI accelerators from Google, Qualcomm, and MediaTek, as well as on CPUs when accelerators aren’t available, it’s aimed squarely at bringing serious on-device AI to the broader Android ecosystem—not just a handful of flagship devices.

Practically, the Developer Preview is an invitation to start experimenting now. You can try the model without writing any code using Google’s AICore Developer Preview tooling, then move straight into Android Studio and the ML Kit GenAI Prompt API when you’re ready to integrate. You can explicitly choose whether you want to test the “fast” E2B variant or the more capable E4B model, depending on whether your scenario is more latency-sensitive or quality-sensitive.

Google’s broader strategy is clear: Gemma 4 complements its proprietary Gemini models, giving developers a mix of open and closed options across cloud and edge. In this preview, Android is positioned as a first-class home for open, local AI—where you can build agents that understand screens, documents, voices, and context, and still keep user data on the device. If you’re building anything from productivity tools to creative apps to utilities that quietly make sense of what’s on screen, Gemma 4 in AICore is essentially your early-access sandbox.