GadgetBond

  • Latest
  • How-to
  • Tech
    • AI
    • Amazon
    • Apple
    • CES
    • Computing
    • Creators
    • Google
    • Meta
    • Microsoft
    • Mobile
    • Samsung
    • Security
    • Xbox
  • Transportation
    • Audi
    • BMW
    • Cadillac
    • E-Bike
    • Ferrari
    • Ford
    • Honda Prelude
    • Lamborghini
    • McLaren W1
    • Mercedes
    • Porsche
    • Rivian
    • Tesla
  • Culture
    • Apple TV
    • Disney
    • Gaming
    • Hulu
    • Marvel
    • HBO Max
    • Netflix
    • Paramount
    • SHOWTIME
    • Star Wars
    • Streaming
Add GadgetBond as a preferred source to see more of our stories on Google.
Font ResizerAa
GadgetBondGadgetBond
  • Latest
  • Tech
  • AI
  • Deals
  • How-to
  • Apps
  • Mobile
  • Gaming
  • Streaming
  • Transportation
Search
  • Latest
  • Deals
  • How-to
  • Tech
    • Amazon
    • Apple
    • CES
    • Computing
    • Creators
    • Google
    • Meta
    • Microsoft
    • Mobile
    • Samsung
    • Security
    • Xbox
  • AI
    • Anthropic
    • ChatGPT
    • ChatGPT Atlas
    • Gemini AI (formerly Bard)
    • Google DeepMind
    • Grok AI
    • Meta AI
    • Microsoft Copilot
    • OpenAI
    • Perplexity
    • xAI
  • Transportation
    • Audi
    • BMW
    • Cadillac
    • E-Bike
    • Ferrari
    • Ford
    • Honda Prelude
    • Lamborghini
    • McLaren W1
    • Mercedes
    • Porsche
    • Rivian
    • Tesla
  • Culture
    • Apple TV
    • Disney
    • Gaming
    • Hulu
    • Marvel
    • HBO Max
    • Netflix
    • Paramount
    • SHOWTIME
    • Star Wars
    • Streaming
Follow US
AIGoogleTech

Google’s new Gemini Embedding 2 supercharges multimodal RAG

Google’s new embedding model maps five different media types into one semantic space, so a single query can cut across your entire content stack.

By
Shubham Sawarkar
Shubham Sawarkar's avatar
ByShubham Sawarkar
Editor-in-Chief
I’m a tech enthusiast who loves exploring gadgets, trends, and innovations. With certifications in CISCO Routing & Switching and Windows Server Administration, I bring a sharp...
Follow:
- Editor-in-Chief
Mar 12, 2026, 7:45 AM EDT
Share
We may get a commission from retail offers. Learn more
Gemini Embedding 2
Image: Google
SHARE

Google has quietly dropped one of its most interesting Gemini updates yet — a model that doesn’t just read your text, but also sees your images, watches your videos, listens to your audio, and parses your PDFs, then throws everything into the same mathematical space. It’s called Gemini Embedding 2, it’s in public preview through the Gemini API and Vertex AI, and if you care about RAG, semantic search, or any kind of “find the right thing in a massive pile of content” problem, this is a big deal.

At a high level, embeddings are just vectors — long lists of numbers that represent the “meaning” of something. Traditionally, you had separate models for different modalities: one for text search, another like CLIP for images, maybe a custom thing for audio. Gemini Embedding 2 collapses all that complexity and says: send me text, images, video, audio, or PDFs and I’ll map them into one unified embedding space where everything can be compared directly. That means you can do things like: search a video archive using a sentence, find images using an audio clip, or use a screenshot as a query to retrieve documents.

Under the hood, this model leans heavily on the multimodal understanding baked into the Gemini architecture. It supports five main input types: text (up to 8,192 tokens), up to six images per request (PNG, JPEG), up to 120 seconds of video (MP4, MOV), raw audio without forcing you through speech-to-text, and PDFs up to six pages. Crucially, it’s not limited to one modality at a time — you can interleave text and images, or mix video frames with audio, and the model jointly embeds the whole thing, capturing relationships such as “this caption refers to that part of the image” or “this spoken line relates to that on-screen action.”

Your browser does not support the video tag.

On the output side, Google is sticking with high-dimensional vectors but making them more flexible. By default, Gemini Embedding 2 emits 3,072-dimensional embeddings, but it uses Matryoshka Representation Learning (MRL) to “nest” information so you can safely truncate down to 1,536 or 768 dimensions while retaining a lot of the semantic power. This matters in practice because vector databases can get expensive: smaller vectors mean cheaper storage, faster indexing, and more responsive search, while still letting you switch to full 3,072-dimension embeddings when you need maximum accuracy, for example, in a reranking stage. Providers like Qdrant are already describing two-pass retrieval setups where you scan with lower-dimension vectors, then rescore top candidates with the full-size ones.

Google is framing this as a state-of-the-art step up from its earlier, mostly text-focused embedding models — especially for multilingual and multimodal tasks. The original Gemini Embedding work already showed strong results on MTEB benchmarks across classification, clustering, and retrieval, outscoring other popular open and commercial models on both English and multilingual leaderboards. Gemini Embedding 2 builds on that but extends it into speech, images, and video, and Google says it now outperforms leading alternatives across text, image, and video tasks while adding robust speech understanding. For developers, the punchline is that you no longer need a patchwork of different models, glue logic, and ad-hoc scoring tricks to get decent multimodal retrieval.

The real shift is what this unlocks for everyday AI workflows. Retrieval-Augmented Generation (RAG) becomes more than “search a PDF store by text and stuff snippets into a prompt”; you can now build systems that retrieve across email-style text, design mocks, product photos, marketing videos, and recorded calls, all inside the same pipeline. Semantic search stops being text-only and starts to look more like “find me anything in this organization that matches this idea, regardless of format.” Classic tasks like sentiment analysis and clustering also benefit because the model isn’t blind to non-textual signals: it can, for instance, cluster video clips by theme or emotion, or group customer tickets that include screenshots and logs, not just words.

Google is already pointing to early adopters to make this less abstract. Media companies, for example, are using Gemini Embedding 2 to search huge archives of B-roll, editorial footage, and untranscribed content. One quoted partner reports that with the new embeddings, simple text queries can now retrieve very specific video shots — including subtle, previously untranscribed micro-expressions — and even allow using an image or a random B-roll clip as the query to discover similar footage. In one internal test, this took their text-to-video Recall@1 to 85.3%, a big jump in how often the top result was exactly the right clip. For anyone who has ever tried to dig through a chaotic media drive, that kind of “it just finds the exact shot I had in mind” experience is a huge productivity boost.

On the platform side, Google is making sure this model is accessible in all the usual modern AI developer workflows. You can hit it directly via the Gemini API or through Vertex AI, which exposes it as gemini-embedding-2-preview and documents how to send different media and tune parameters like task (for retrieval, code search, or custom semantics) and output_dimensionality. If you live in the vector database ecosystem, you get ready-made integrations with LangChain, LlamaIndex, Haystack, Weaviate, Qdrant, ChromaDB, and Vertex Vector Search, so plugging it into an existing stack is mostly a matter of swapping out the embedding backend and reindexing. There are also Colab notebooks from Google that walk through setting up semantic search and multimodal RAG pipelines end-to-end.

For all the engineering talk, the practical examples are where this gets interesting. Imagine a support system where a user uploads a shaky phone video of a device problem; the system embeds the audio, video, and any overlaid text together, then retrieves the most relevant troubleshooting guides, past tickets, and internal docs in one shot. Or an internal search tool where a designer can drag in a prototype screenshot and instantly find the latest Figma specs, design tokens, and even related product requirement docs, without hand-curated tags. In enterprise settings, this could stretch to compliance (searching across recorded calls, scanned documents, and dashboards) or creative workflows (matching music and visuals, curating highlight reels from huge video dumps) — essentially anywhere you have a messy mix of formats and a vague question in natural language.

There are also clear cost and operations angles here. Because of MRL and adjustable dimensionality, teams can start with lower-dimension embeddings for broad recall, then selectively bump up to full-size vectors in smaller, high-precision stages. This pattern plays nicely with multi-stage retrieval architectures that are becoming standard in large-scale RAG and search systems, where you blend fast approximate search with slower, more accurate reranking. And, since the model is multilingual across more than 100 languages, global products can unify their search infrastructure instead of managing separate models and indexes per language.

The launch timing also fits Google’s broader strategy: keep Gemini at the center, but ship specialized pieces that solve very real, very unsexy infrastructure problems for developers — retrieval quality, multimodal search, latency, and cost. LLMs get the headlines, but embeddings quietly decide whether your chatbot actually finds the right context or your “AI search” feels magical versus mediocre. With Gemini Embedding 2, Google is betting that a single, natively multimodal embedding backbone will become the default for these retrieval-heavy AI apps, especially as more companies move from text-only workflows to truly mixed media.


Discover more from GadgetBond

Subscribe to get the latest posts sent to your email.

Topic:Gemini AI (formerly Bard)
Leave a Comment

Leave a ReplyCancel reply

Most Popular

Perplexity Computer adds a Command Panel

Summer Sale gives Nothing’s lineup a more tempting price tag

Also Read
Collage of four web-based artifacts created with Claude Code, including an analytics dashboard, a mobile app design showcase, a software migration report, and a systems workflow visualization. The examples demonstrate interactive interfaces, data-rich dashboards, design systems, and technical documentation generated through AI-assisted development.

Live artifacts come to Claude Code

Illustration of a Claude Connectors settings panel with organization-wide access enabled. A large toggle switch labeled “Enable for organization” is turned on, and a hand-shaped cursor points to it. Below, a list of connected apps—Asana, Atlassian, Canva, Figma, and Granola—each displays an enabled blue toggle switch. The interface appears on a light gray background with a clean, minimalist design.

Claude just solved the enterprise AI authorization headache — and it only took one login

Abstract 3D visualization of a connected network represented as a dark globe covered with intersecting lines and glowing spherical nodes. The illuminated points appear linked across the curved surface, symbolizing artificial intelligence, neural networks, global data connections, and knowledge processing.

Perplexity launches Brain for its Computer agent

Simple illustration of a shopping bag with a keyhole symbol on the front, representing secure or private shopping, on a solid orange background.

Anthropic killed the API key (for workloads, at least)

Design editor interface displaying a crowdfunding webpage for Maple Grove Park alongside a Claude Code terminal window. The design canvas shows editable text, fundraising progress, and donation information, while Claude Code is used to synchronize design components between the visual editor and development workflow.

Claude Design adds admin controls, direct editing, and a connector army

Abstract promotional graphic for LifeSciBench featuring layered design elements on a soft blue gradient background with light reflections and blurred yellow highlights. The composition includes a pale yellow rectangle, a scientific-style bar chart with error bars, and a large cropped text block reading “LifeSciBench” in bold black lettering on a light blue panel. The clean, modern layout combines data visualization and branding elements to represent a life sciences benchmarking or evaluation platform.

OpenAI’s GPT-Rosalind leads LifeSciBench — at a 36% pass rate

Abstract science-themed graphic featuring a soft green and blue gradient background with layered geometric shapes. A chemical structure diagram labeled “4-hydroxy-TEMPO” appears in the upper-right section, while large cropped black typography partially displays the letters “Mo.” The composition combines molecular chemistry imagery with modern design elements, suggesting a scientific research, chemistry, or drug discovery platform.

OpenAI’s near-autonomous chemist just proved it can do real wet-lab science

Apple iCloud logo displayed on a blue gradient background. The image features the iCloud cloud icon centered above the “iCloud” wordmark in white, representing Apple’s cloud storage and synchronization service used for backing up data, syncing files, photos, documents, and settings across iPhone, iPad, Mac, Apple Watch, and other Apple devices.

Apple’s new private.icloud.com domain has a downside

Company Info
  • Homepage
  • Support my work
  • Latest stories
  • Company updates
  • GDB Recommends
  • Daily newsletters
  • About us
  • Contact us
  • Write for us
  • Editorial guidelines
Legal
  • Privacy Policy
  • Cookies Policy
  • Terms & Conditions
  • DMCA
  • Disclaimer
  • Accessibility Policy
  • Security Policy
  • Do Not Sell or Share My Personal Information
Socials
Follow US

Disclosure: We love the products we feature and hope you’ll love them too. If you purchase through a link on our site, we may receive compensation at no additional cost to you. Read our ethics statement. Please note that pricing and availability are subject to change.

Copyright © 2026 GadgetBond. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | Do Not Sell/Share My Personal Information.