Mozilla open-sources 0DIN AI Security Scanner to expose hidden model vulnerabilities

Mozilla is taking one of the biggest bets yet on open AI security — and it’s doing it the Mozilla way: by putting the tools, the playbook, and a good chunk of its hard‑earned exploit intelligence directly into the hands of the public.

This week, Mozilla’s 0DIN team announced that the 0DIN AI Security Scanner is going fully open source under the Apache 2.0 license, turning what was essentially an internal, researcher‑grade security platform into a community tool anyone can audit, extend, or plug into their own AI stacks. It’s not just the scanner code that’s being opened up — Mozilla is also seeding it with 179 security probes across 35 vulnerability families, plus six specialty probes pulled straight from real‑world bug bounty findings that previously lived behind closed doors.

If you work anywhere near LLMs, agents, or AI products in production, this matters a lot more than yet another “we care about safety” blog post. The scanner is effectively Mozilla saying: here’s the same kind of ammunition attackers are using against AI systems — now defenders get it too.

At the core, 0DIN Scanner is a web app that lets you point structured adversarial tests at pretty much anything that responds to prompts: frontier models, open‑weight LLMs, multi‑step agents, or the custom chatbot your team wired into internal data last quarter. It’s built on top of NVIDIA’s open‑source GARAK framework, which has quietly become one of the default tools for probing LLMs for weaknesses. GARAK does the heavy lifting on the probing side; 0DIN layers on a graphical UI, scheduling, cross‑model comparison, dashboards, and reporting, so security and product teams don’t have to live in the terminal just to understand how exposed their models are.

What really differentiates this thing from yet another open‑source “red teaming toolkit” is where its probes come from. Mozilla’s 0DIN program is a GenAI bug bounty platform, where independent researchers are incentivized to find jailbreaks, prompt injection chains, data exfiltration paths, and weird edge‑case failures that never show up in polished research benchmarks. Those live fire vulnerabilities are then turned into reusable probes that ship with the scanner, so when you run it against your own model endpoint, you’re effectively asking: “What would a motivated attacker who already broke other models try against mine?”

Mozilla is even naming some of these probes publicly for the first time: Placeholder Injection, Incremental Table Completion, Technical Field Guide, Chemical Compiler Debug, Correction, and Hex Recipe Book. Each of those represents a specific exploit pattern that worked on real AI systems before it was patched — think attacks like gradually coaxing a model into reconstructing sensitive tables one row at a time, or using technical documentation‑style prompts to smuggle restricted content past filters.

To score how bad a given jailbreak or leak actually is, the scanner leans on JEF — the Jailbreak Evaluation Framework — another open‑source project from 0DIN that tries to be a CVSS‑style scoring system for AI attacks. JEF looks at things like how reproducible the jailbreak is, how broadly it generalizes across models, what the “blast radius” is, and how close the output gets to what the attacker wanted. Those scores roll up into a 0–10 number that 0DIN uses internally to triage bug bounty submissions — and now the same logic is being wired into the scanner so teams can prioritize “this is mildly embarrassing” versus “this will get us on the front page if it leaks.”

From a workflow perspective, 0DIN is trying to meet security teams where they already are. You can run structured probe sets, get attack success rates, see breakdowns by category — jailbreaks, prompt injection, data extraction and so on — and then compare your internal models against the frontier systems attackers are simultaneously stress‑testing, like GPT‑class, Claude, or Gemini‑style models. The pitch is: if your in‑house assistant or RAG agent falls over faster than the big public models when hit with the same attacks, that’s a pretty clear signal you’ve got work to do.

Mozilla is also trying to close a very real talent and capacity gap. Not every company rolling out AI has an internal red team or people who live and breathe adversarial ML. Many are shipping AI features while still hazy on how prompt injection actually plays out in real usage. To make that less dangerous, Mozilla and 0DIN are offering free security assessments: you hook up your AI endpoint, pick your probe sets, and they’ll run scans that output an attack success rate, category breakdown, and frontier‑model benchmark. Setup can be done in minutes; scan times depend on how aggressively you want to test.

For teams who like the idea of open tooling but don’t want to host and maintain it, there’s also a managed enterprise edition of the scanner. That version goes further, with close to 500 pre‑disclosure probes that haven’t yet been made public — essentially an early warning system for emerging attack techniques discovered through the bug bounty pipeline. The open‑source version still gets a steady stream of probes as vulnerabilities are fully disclosed; the paid platform just gets them earlier and with more operational bells and whistles like continuous monitoring and richer dashboards.

The obvious question is: why open source any of this at all, especially the exploit intelligence? Mozilla’s answer is pretty on brand. For decades, the organization’s stance has been that the web is healthier when the critical infrastructure is open: people can inspect it, fork it, stress it, and build on top of it. Firefox was built that way; now they’re applying the same logic to AI security.

There’s also a hard reality behind the idealism: AI is moving too fast, with too many models and too much attack surface, for any single vendor to cover it alone. Every serious AI shop is discovering that once you connect a model to tools, data sources, or users with something to lose, you’ve opened up a new security perimeter that looks nothing like a traditional web app. If serious testing frameworks and exploit taxonomies stay proprietary, the companies with the least maturity — often the ones most likely to make mistakes — are the ones left flying blind.

By open‑sourcing the scanner and a meaningful chunk of its bug bounty intelligence, Mozilla is basically formalizing a deal it’s been operating on for years: the community helps find and fix vulnerabilities, and in return, the knowledge gained doesn’t stay locked up; it gets folded back into tools everyone can use. Researchers get paid and credited, defenders get better probes, and attackers don’t remain the only ones with high‑quality exploit playbooks.

Zoom out a bit and this move quietly shifts expectations for what “responsible AI” should look like in 2026. We’ve already got open models, open weights, and open infrastructure projects; now we’re seeing the emergence of open security tooling explicitly designed to keep those systems honest. NVIDIA’s GARAK laid a big part of the groundwork for this kind of probe‑based red‑teaming; Mozilla and 0DIN are taking that foundation and wiring it into a more opinionated, production‑friendly platform that bakes in live intelligence from ongoing attacks.

For practitioners, the implications are pretty straightforward. If you’re running an internal assistant against sensitive company data, building customer‑facing chatbots, or experimenting with agentic workflows that can take actions in the real world, it’s becoming harder to justify not running something like 0DIN or GARAK as part of your pre‑launch and ongoing checks. Traditional app scanners and SAST tools weren’t built with prompt injection, tool‑calling abuse, or multi‑step jailbreak chains in mind; this new class of scanners is.

It also pushes the AI ecosystem toward a more measurable definition of safety. Instead of hand‑wavy claims about “robust guardrails,” teams can talk about concrete metrics: how often particular jailbreak families succeed, how their models behave relative to major providers under standardized probes, whether their posture is improving or degrading over time. That’s the kind of language security teams, regulators, and eventually customers are going to expect.

Mozilla isn’t pretending this solves AI security outright. Even with 0DIN Scanner, you’re still up against a fast‑evolving, adversarial landscape where coordinated attackers, automated exploit discovery, and model‑to‑model chaining make things weird very quickly. But shipping the scanner, the probes, and the scoring framework under open licenses is a clear signal: if we want AI to be as critical as the web, we have to treat AI security like critical infrastructure — and that means putting serious tooling in public, not behind vendor paywalls.

The bottom line: 0DIN going open source turns AI security from something only a handful of well‑funded players can do well into something that any reasonably motivated team can start operationalizing. If Mozilla gets its way, running a jailbreaking scanner on your model before launch will eventually feel as normal — and as necessary — as running a penetration test on your web app.