GadgetBond

  • Latest
  • How-to
  • Tech
    • AI
    • Amazon
    • Apple
    • CES
    • Computing
    • Creators
    • Google
    • Meta
    • Microsoft
    • Mobile
    • Samsung
    • Security
    • Xbox
  • Transportation
    • Audi
    • BMW
    • Cadillac
    • E-Bike
    • Ferrari
    • Ford
    • Honda Prelude
    • Lamborghini
    • McLaren W1
    • Mercedes
    • Porsche
    • Rivian
    • Tesla
  • Culture
    • Apple TV
    • Disney
    • Gaming
    • Hulu
    • Marvel
    • HBO Max
    • Netflix
    • Paramount
    • SHOWTIME
    • Star Wars
    • Streaming
Add GadgetBond as a preferred source to see more of our stories on Google.
Font ResizerAa
GadgetBondGadgetBond
  • Latest
  • Tech
  • AI
  • Deals
  • How-to
  • Apps
  • Mobile
  • Gaming
  • Streaming
  • Transportation
Search
  • Latest
  • Deals
  • How-to
  • Tech
    • Amazon
    • Apple
    • CES
    • Computing
    • Creators
    • Google
    • Meta
    • Microsoft
    • Mobile
    • Samsung
    • Security
    • Xbox
  • AI
    • Anthropic
    • ChatGPT
    • ChatGPT Atlas
    • Gemini AI (formerly Bard)
    • Google DeepMind
    • Grok AI
    • Meta AI
    • Microsoft Copilot
    • OpenAI
    • Perplexity
    • xAI
  • Transportation
    • Audi
    • BMW
    • Cadillac
    • E-Bike
    • Ferrari
    • Ford
    • Honda Prelude
    • Lamborghini
    • McLaren W1
    • Mercedes
    • Porsche
    • Rivian
    • Tesla
  • Culture
    • Apple TV
    • Disney
    • Gaming
    • Hulu
    • Marvel
    • HBO Max
    • Netflix
    • Paramount
    • SHOWTIME
    • Star Wars
    • Streaming
Follow US
AIAnthropicTech

Anthropic releases Bloom, an agentic tool for continuous AI evaluation

Bloom is Anthropic’s open source framework for probing AI behavior.

By
Shubham Sawarkar
Shubham Sawarkar's avatar
ByShubham Sawarkar
Editor-in-Chief
I’m a tech enthusiast who loves exploring gadgets, trends, and innovations. With certifications in CISCO Routing & Switching and Windows Server Administration, I bring a sharp...
Follow:
- Editor-in-Chief
Dec 20, 2025, 2:00 AM EST
Share
We may get a commission from retail offers. Learn more
Anthropic illustration
Image: Anthropic
SHARE

Anthropic quietly dropped a new tool this month that feels like the next pragmatic step in turning alignment work from artisanal craft into repeatable engineering: Bloom, an open-source, agentic framework that automatically generates and runs behavioral evaluations against frontier AI models. Rather than hand-crafting a handful of tests and hoping they still matter weeks later, Bloom lets a researcher define a behavior once, then procedurally spawns fresh scenarios, runs them, and scores how often and how strongly that behavior shows up — all in a single, configurable pipeline.

At its core, Bloom is built around the idea of a “seed”: a small configuration that specifies the behavior you care about (think “sycophancy,” “self-preservation,” or “instructed long-horizon sabotage”), example transcripts if you have them, and a set of parameters that shape how aggressive or diverse the evaluation should be. From that seed, Bloom scaffolds an entire evaluation suite — generating scenarios, simulating users and tools, running the target model in parallel rollouts, and then asking a judge model to score each transcript. The repository and quick-start make it clear the project is intended to be practical: you can run it locally, plug in different LLM providers, and scale experiments with Weights & Biases.

That procedural approach is what separates Bloom from older, static benchmarks. Traditional test sets are fragile: they take time to design, they can leak into training data, and they often stop being discriminative as models iterate. Bloom’s evaluations intentionally change every run — the scenarios “bloom” differently depending on the seed — but the seed itself is meant to be cited with results so other teams can reproduce experiments. In short, variability where it helps (scenario generation), stability where it matters (seeded reproducibility).

Technically, Bloom behaves like a small team of coordinating agents. The first “understanding” agent digests the researcher’s behavior description and examples, expanding them into what counts as the behavior and why it matters. An “ideation” agent writes candidate scenarios — who the user is, the system prompt, and whether tools are in play. A “rollout” agent executes those scenarios against the model under test, producing multi-turn transcripts. Finally, a judgment agent scores each transcript and a meta-judge aggregates suite-level metrics like elicitation rate and average behavior presence. You can swap models at each stage, tweak conversation length, add secondary metrics such as “realism” or “evaluation awareness,” and export transcripts for qualitative review. The GitHub repo and Anthropic’s write-up both walk through the exact parameters and example configurations.

Anthropic positions Bloom as a complement, not a replacement, to the tools it released earlier — notably Petri, which explores a model’s broader behavioral profile across many conversations. Where Petri is breadth-first, surfacing many behaviors across diverse multi-turn exchanges, Bloom is depth-first: pick one behavior and measure it, systematically and scalably. That pairing matters because it reflects a shift in tooling strategy: use one system to flag suspicious patterns broadly, and another to probe one behavior with rigor.

The team didn’t just open-source code and walk away. They ran Bloom on a suite of cases to show what it can do. In internal validations, Anthropic compared production Claude models against intentionally misaligned “model organisms” (system-prompted models designed to be quirky or unsafe). Across ten such quirks, Bloom separated the misaligned organisms from baseline models in nine cases; in the tenth case, manual review suggested the baseline actually displayed similar behavior. They also hand-labeled 40 transcripts and compared human judgments with Bloom’s automated judges: Claude Opus 4.1 produced the strongest Spearman correlation to human scores (0.86), with Sonnet 4.5 behind it at 0.75. Those figures aren’t just PR — they’re built into the technical report and the repo’s validation scripts.

A concrete example Anthropic shares is a revisit of “self-preferential bias” — situations where a model favors its own interests when asked to rank options. Using example transcripts patterned after a Claude system card, Bloom reproduced the same model ranking from the original evaluation and found that raising the judge model’s reasoning effort reduced bias for some models. The platform’s ability to add secondary filters — for instance, dropping transcripts that seem unrealistic or that show the model “gaming” the evaluation — also improved the clarity and quality of the findings. That combination of replication plus deeper filtering is the kind of workflow that could make behavior audits both faster and more defensible.

Bloom’s release matters for more than just Anthropic-adjacent teams. As models are embedded into tool-rich environments and given more autonomy, measuring behavior becomes a continuous engineering problem, not a one-off research exercise. By open-sourcing a configurable, agentic pipeline, Anthropic is giving other researchers and red-teamers a shared backbone for tasks like testing nested jailbreaks, measuring evaluation awareness, and constructing sabotage traces — work that’s increasingly urgent as capabilities accelerate. The code, examples, and sample seeds are available on GitHub, and Anthropic’s Alignment Science blog includes a longer technical dive and experimental appendices for anyone who wants to reproduce the benchmarks.

Of course, the tool isn’t a magic bullet. Procedural generation shifts part of the burden from prompt engineering to configuration design: choices about diversity, evaluator models, conversation length, and example selection visibly change absolute scores even when relative rankings are stable. And automated judges — however well-tuned — will never replace thoughtful human review for novel or surprising failure modes. Still, Bloom’s real contribution is pragmatic: it lowers the cost of running disciplined, repeatable behavior checks, and it gives the community a common language (seed files, judge configs, transcripts) to compare results.


Discover more from GadgetBond

Subscribe to get the latest posts sent to your email.

Leave a Comment

Leave a ReplyCancel reply

Most Popular

Anthropic bundles chat, Cowork, and Code into one enterprise desktop app

Perplexity unveils a legal-specific AI Computer for Counsel

Elon Musk confirms “Starmind” as SpaceX’s AI satellite constellation name

Camp Snoopy season two heads to Apple TV tomorrow

The logic behind Claude Tag’s identity model

Also Read
OpenAI and Broadcom leaders display the Jalapeño inference chip.

OpenAI and Broadcom unveil Jalapeño, their first custom AI inference chip

Airline seatback inside a Southwest Airlines aircraft featuring a promotional card announcing Starlink WiFi service. The sign reads “It’s Here! You’re on one of the first planes featuring Starlink WiFi,” with Southwest and Starlink branding displayed at the top. A smartphone mounted on the tray table shows the onboard internet portal offering free WiFi access. The image highlights the rollout of Starlink’s high-speed satellite internet service on Southwest Airlines flights.

Southwest Airlines now has Starlink WiFi onboard

View from inside an airplane cabin showing a passenger holding a smartphone near an oval aircraft window. Outside, the airplane wing extends above a blanket of clouds under a blue sky. The image highlights in-flight connectivity and mobile device usage during air travel, commonly associated with onboard internet services such as Starlink Aviation.

Starlink Wi-Fi launches on American Airlines flights in early 2027

Minimalist event graphic featuring the text “OpenAI DevDay [2026]” centered on a solid black background. The words “OpenAI” appear in white, “DevDay” in blue, and “2026” in green within white brackets, creating a clean, modern design that promotes OpenAI’s 2026 developer conference and event announcements.

OpenAI calls developers to DevDay 2026 – apply before July 10

A blurred, warmly lit office or workspace forms the background of a promotional graphic featuring the text “@Claude” in large white serif lettering inside a rounded salmon-colored label. The soft-focus scene includes shelves, furniture, and ambient lighting in shades of brown and orange, creating a professional and inviting atmosphere associated with Anthropic’s Claude AI assistant.

Anthropic launches Claude Tag beta for enterprise and teams

Intricate abstract blue and purple 3D geometric art with smooth curves and bold contrasts.

OpenAI’s Daybreak shifts focus from finding bugs to fixing them

Logo featuring a stylized orange asterisk-like symbol followed by the word 'Claude' in bold black serif font on a light beige background.

Anthropic launches Japan Claude Community Ambassador program after 290+ global meetups

OpenAI logo displayed prominently against a vibrant background with gradient colors blending from blue to green and yellow. The logo features a geometric design of an interlocking hexagonal pattern in black.

Samsung rolls out ChatGPT Enterprise to all employees worldwide

Company Info
  • Homepage
  • Support my work
  • Latest stories
  • Company updates
  • GDB Recommends
  • Daily newsletters
  • About us
  • Contact us
  • Write for us
  • Editorial guidelines
Legal
  • Privacy Policy
  • Cookies Policy
  • Terms & Conditions
  • DMCA
  • Disclaimer
  • Accessibility Policy
  • Security Policy
  • Do Not Sell or Share My Personal Information
Socials
Follow US

Disclosure: We love the products we feature and hope you’ll love them too. If you purchase through a link on our site, we may receive compensation at no additional cost to you. Read our ethics statement. Please note that pricing and availability are subject to change.

Copyright © 2026 GadgetBond. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | Do Not Sell/Share My Personal Information.