By using this site, you agree to the Privacy Policy and Terms of Use.
Accept

GadgetBond

  • Latest
  • How-to
  • Tech
    • AI
    • Amazon
    • Apple
    • CES
    • Computing
    • Creators
    • Google
    • Meta
    • Microsoft
    • Mobile
    • Samsung
    • Security
    • Xbox
  • Transportation
    • Audi
    • BMW
    • Cadillac
    • E-Bike
    • Ferrari
    • Ford
    • Honda Prelude
    • Lamborghini
    • McLaren W1
    • Mercedes
    • Porsche
    • Rivian
    • Tesla
  • Culture
    • Apple TV
    • Disney
    • Gaming
    • Hulu
    • Marvel
    • HBO Max
    • Netflix
    • Paramount
    • SHOWTIME
    • Star Wars
    • Streaming
Add GadgetBond as a preferred source to see more of our stories on Google.
Font ResizerAa
GadgetBondGadgetBond
  • Latest
  • Tech
  • AI
  • Deals
  • How-to
  • Apps
  • Mobile
  • Gaming
  • Streaming
  • Transportation
Search
  • Latest
  • Deals
  • How-to
  • Tech
    • Amazon
    • Apple
    • CES
    • Computing
    • Creators
    • Google
    • Meta
    • Microsoft
    • Mobile
    • Samsung
    • Security
    • Xbox
  • AI
    • Anthropic
    • ChatGPT
    • ChatGPT Atlas
    • Gemini AI (formerly Bard)
    • Google DeepMind
    • Grok AI
    • Meta AI
    • Microsoft Copilot
    • OpenAI
    • Perplexity
    • xAI
  • Transportation
    • Audi
    • BMW
    • Cadillac
    • E-Bike
    • Ferrari
    • Ford
    • Honda Prelude
    • Lamborghini
    • McLaren W1
    • Mercedes
    • Porsche
    • Rivian
    • Tesla
  • Culture
    • Apple TV
    • Disney
    • Gaming
    • Hulu
    • Marvel
    • HBO Max
    • Netflix
    • Paramount
    • SHOWTIME
    • Star Wars
    • Streaming
Follow US
AIGoogleTech

Google’s new Gemini Embedding 2 supercharges multimodal RAG

Google’s new embedding model maps five different media types into one semantic space, so a single query can cut across your entire content stack.

By
Shubham Sawarkar
Shubham Sawarkar's avatar
ByShubham Sawarkar
Editor-in-Chief
I’m a tech enthusiast who loves exploring gadgets, trends, and innovations. With certifications in CISCO Routing & Switching and Windows Server Administration, I bring a sharp...
Follow:
- Editor-in-Chief
Mar 12, 2026, 7:45 AM EDT
Share
We may get a commission from retail offers. Learn more
Gemini Embedding 2
Image: Google
SHARE

Google has quietly dropped one of its most interesting Gemini updates yet — a model that doesn’t just read your text, but also sees your images, watches your videos, listens to your audio, and parses your PDFs, then throws everything into the same mathematical space. It’s called Gemini Embedding 2, it’s in public preview through the Gemini API and Vertex AI, and if you care about RAG, semantic search, or any kind of “find the right thing in a massive pile of content” problem, this is a big deal.

At a high level, embeddings are just vectors — long lists of numbers that represent the “meaning” of something. Traditionally, you had separate models for different modalities: one for text search, another like CLIP for images, maybe a custom thing for audio. Gemini Embedding 2 collapses all that complexity and says: send me text, images, video, audio, or PDFs and I’ll map them into one unified embedding space where everything can be compared directly. That means you can do things like: search a video archive using a sentence, find images using an audio clip, or use a screenshot as a query to retrieve documents.

Under the hood, this model leans heavily on the multimodal understanding baked into the Gemini architecture. It supports five main input types: text (up to 8,192 tokens), up to six images per request (PNG, JPEG), up to 120 seconds of video (MP4, MOV), raw audio without forcing you through speech-to-text, and PDFs up to six pages. Crucially, it’s not limited to one modality at a time — you can interleave text and images, or mix video frames with audio, and the model jointly embeds the whole thing, capturing relationships such as “this caption refers to that part of the image” or “this spoken line relates to that on-screen action.”

Your browser does not support the video tag.

On the output side, Google is sticking with high-dimensional vectors but making them more flexible. By default, Gemini Embedding 2 emits 3,072-dimensional embeddings, but it uses Matryoshka Representation Learning (MRL) to “nest” information so you can safely truncate down to 1,536 or 768 dimensions while retaining a lot of the semantic power. This matters in practice because vector databases can get expensive: smaller vectors mean cheaper storage, faster indexing, and more responsive search, while still letting you switch to full 3,072-dimension embeddings when you need maximum accuracy, for example, in a reranking stage. Providers like Qdrant are already describing two-pass retrieval setups where you scan with lower-dimension vectors, then rescore top candidates with the full-size ones.

Google is framing this as a state-of-the-art step up from its earlier, mostly text-focused embedding models — especially for multilingual and multimodal tasks. The original Gemini Embedding work already showed strong results on MTEB benchmarks across classification, clustering, and retrieval, outscoring other popular open and commercial models on both English and multilingual leaderboards. Gemini Embedding 2 builds on that but extends it into speech, images, and video, and Google says it now outperforms leading alternatives across text, image, and video tasks while adding robust speech understanding. For developers, the punchline is that you no longer need a patchwork of different models, glue logic, and ad-hoc scoring tricks to get decent multimodal retrieval.

The real shift is what this unlocks for everyday AI workflows. Retrieval-Augmented Generation (RAG) becomes more than “search a PDF store by text and stuff snippets into a prompt”; you can now build systems that retrieve across email-style text, design mocks, product photos, marketing videos, and recorded calls, all inside the same pipeline. Semantic search stops being text-only and starts to look more like “find me anything in this organization that matches this idea, regardless of format.” Classic tasks like sentiment analysis and clustering also benefit because the model isn’t blind to non-textual signals: it can, for instance, cluster video clips by theme or emotion, or group customer tickets that include screenshots and logs, not just words.

Google is already pointing to early adopters to make this less abstract. Media companies, for example, are using Gemini Embedding 2 to search huge archives of B-roll, editorial footage, and untranscribed content. One quoted partner reports that with the new embeddings, simple text queries can now retrieve very specific video shots — including subtle, previously untranscribed micro-expressions — and even allow using an image or a random B-roll clip as the query to discover similar footage. In one internal test, this took their text-to-video Recall@1 to 85.3%, a big jump in how often the top result was exactly the right clip. For anyone who has ever tried to dig through a chaotic media drive, that kind of “it just finds the exact shot I had in mind” experience is a huge productivity boost.

On the platform side, Google is making sure this model is accessible in all the usual modern AI developer workflows. You can hit it directly via the Gemini API or through Vertex AI, which exposes it as gemini-embedding-2-preview and documents how to send different media and tune parameters like task (for retrieval, code search, or custom semantics) and output_dimensionality. If you live in the vector database ecosystem, you get ready-made integrations with LangChain, LlamaIndex, Haystack, Weaviate, Qdrant, ChromaDB, and Vertex Vector Search, so plugging it into an existing stack is mostly a matter of swapping out the embedding backend and reindexing. There are also Colab notebooks from Google that walk through setting up semantic search and multimodal RAG pipelines end-to-end.

For all the engineering talk, the practical examples are where this gets interesting. Imagine a support system where a user uploads a shaky phone video of a device problem; the system embeds the audio, video, and any overlaid text together, then retrieves the most relevant troubleshooting guides, past tickets, and internal docs in one shot. Or an internal search tool where a designer can drag in a prototype screenshot and instantly find the latest Figma specs, design tokens, and even related product requirement docs, without hand-curated tags. In enterprise settings, this could stretch to compliance (searching across recorded calls, scanned documents, and dashboards) or creative workflows (matching music and visuals, curating highlight reels from huge video dumps) — essentially anywhere you have a messy mix of formats and a vague question in natural language.

There are also clear cost and operations angles here. Because of MRL and adjustable dimensionality, teams can start with lower-dimension embeddings for broad recall, then selectively bump up to full-size vectors in smaller, high-precision stages. This pattern plays nicely with multi-stage retrieval architectures that are becoming standard in large-scale RAG and search systems, where you blend fast approximate search with slower, more accurate reranking. And, since the model is multilingual across more than 100 languages, global products can unify their search infrastructure instead of managing separate models and indexes per language.

The launch timing also fits Google’s broader strategy: keep Gemini at the center, but ship specialized pieces that solve very real, very unsexy infrastructure problems for developers — retrieval quality, multimodal search, latency, and cost. LLMs get the headlines, but embeddings quietly decide whether your chatbot actually finds the right context or your “AI search” feels magical versus mediocre. With Gemini Embedding 2, Google is betting that a single, natively multimodal embedding backbone will become the default for these retrieval-heavy AI apps, especially as more companies move from text-only workflows to truly mixed media.


Discover more from GadgetBond

Subscribe to get the latest posts sent to your email.

Topic:Gemini AI (formerly Bard)
Leave a Comment

Leave a ReplyCancel reply

Most Popular

Perplexity Computer is now inside Microsoft Teams

Google Docs now lets you set custom instructions for Gemini

OpenAI’s rumored ChatGPT phone targets 2027 launch window

Google Workspace now has a central hub to control all AI and agent access

Perplexity health search gets a major upgrade with Premium Sources

Also Read
Minimal graphic with the text “ChatGPT Futures” in black on a light purple background, with the word “Futures” highlighted by a hand-drawn yellow circle.

OpenAI unveils ChatGPT Futures Class of 2026

Anthropic

Anthropic’s SpaceX compute deal supercharges Claude usage limits

Screenshot of a “Dreaming” interface for AI agent memory management on a light blue background. A pop-up window titled “Dream” explains that recent agent transcripts are reviewed to organize memories and surface new learnings. The interface includes dropdown menus for selecting a memory store and AI model, a session ID input field, and a “Start dreaming” button being clicked. In the background, a dashboard lists multiple memory stores with statuses, token counts, and creation times, alongside a notification reading “Dreaming started.”

Claude agents can now “dream” their way to better performance

Perplexity illustration. Abstract illustration of a transparent glass cube refracting beams of light into rainbow-like streaks across a dark, textured surface, symbolizing clarity, synthesis, and the convergence of multiple perspectives.

Perplexity Agent API now ships with Finance Search for structured financial insight

Apple showing off Siri’s updated logo at WWDC 2024.

Apple faces $250 million payout after overselling AI Siri on iPhone 16

Minimal promotional graphic featuring the text “GPT-5.5 Instant” centered inside a rounded white rectangle, set against a soft abstract background with blurred pastel gradients in pink, purple, orange, and blue tones.

GPT-5.5 Instant replaces GPT-5.3 as OpenAI’s everyday ChatGPT model

Promotional interface mockup for Perplexity Computer focused on professional finance workflows, showing an “NVDA Post Earnings Impact Memo” with financial tables, charts, and analysis sections alongside a task panel requesting an AI-generated NVIDIA earnings summary with market insights and semiconductor industry implications.

Perplexity launches Computer for Professional Finance

Illustration of Google Chrome enhanced autofill showing three side-by-side form examples for loyalty card numbers, vehicle license plates, and travel confirmation numbers. Each input field displays a dropdown suggestion card with saved information and management options against a blue background.

Google Chrome’s enhanced autofill completely changes how you fill out tedious online forms

Company Info
  • Homepage
  • Support my work
  • Latest stories
  • Company updates
  • GDB Recommends
  • Daily newsletters
  • About us
  • Contact us
  • Write for us
  • Editorial guidelines
Legal
  • Privacy Policy
  • Cookies Policy
  • Terms & Conditions
  • DMCA
  • Disclaimer
  • Accessibility Policy
  • Security Policy
  • Do Not Sell or Share My Personal Information
Socials
Follow US

Disclosure: We love the products we feature and hope you’ll love them too. If you purchase through a link on our site, we may receive compensation at no additional cost to you. Read our ethics statement. Please note that pricing and availability are subject to change.

Copyright © 2026 GadgetBond. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | Do Not Sell/Share My Personal Information.

Advertisement
Amazon Summer Beauty Event 2026