Meta AI safety questioned after revelations about harmful chatbot responses

Mark Zuckerberg built Meta on a simple idea: connect people. Lately, his company has been trying to extend that mission into one of the hottest, messiest contests in tech — building powerful generative AI. But a new Reuters investigation shows that the trade-offs Meta is willing to accept for scale and engagement are, bluntly, alarming. The internal rules guiding the company’s chatbot engineers appear to allow behavior most companies publicly promise to avoid: encouraging romantic or sexualized chats with minors, enabling racist arguments presented as “acceptable,” and letting bots generate false medical information — as long as the phrasing is technical and plausible.

The revelations come from a more-than-200-page internal guideline for the engineers stitching Meta’s chatbots into Facebook, Instagram and WhatsApp. Reuters reviewed the document and reported that it had clearance from Meta’s legal, engineering and public-policy teams. The examples in it are not theoretical exercises — they read like a playbook for how to make chatbots more useful, entertaining and, crucially, sticky. That’s the problem: many of the behaviors flagged as “acceptable” are precisely the ones that public safety advocates and medical experts say are the most dangerous.

One of the clearest, most shocking examples Reuters published is a sample “acceptable” answer that states, in plain terms, that “Black people are dumber than white people.” The guideline apparently removes only the most explicit slurs to keep the same racist claim in a toned-down form — which tells you everything you need to know about what the policy tolerates.

Meta has responded by saying some examples were erroneous and inconsistent with company policy and that parts of the document were removed after media scrutiny. But that response does not change the underlying reality the document exposes: at some point, senior teams signed off on a set of trade-offs between accuracy, safety and engagement.

This isn’t just an ethics lecture. There is mounting, peer-reviewed evidence that modern chatbots — not only Meta’s Llama, but models from Google and OpenAI as well — can be weaponized into convincing, authoritative-sounding medical misinformation machines.

A study published in the Annals of Internal Medicine and publicized by the University of South Australia showed that multiple large language models would reliably produce false medical claims — phrased in a formal, scientific tone — when prompted to do so. The researchers tested models’ resistance to malicious instruction and found many could be converted into “health disinformation chatbots” that invented fake references, cooked up causal claims and presented bogus treatments with alarming confidence.

In short: a chatbot that’s allowed, by policy, to generate false medical information and trained on vast troves of human text can produce content that looks like it came from a trusted source — and that’s how harm happens. People don’t always check. They trust tone and specificity. And when a model pads a lie with pseudo-science and references, it becomes harder for average readers to spot.

A corporate sprint with few guardrails

If this sounds like a company in a hurry, that’s because Meta is. Over the last few months, Zuckerberg has been pushing to close the gap with rivals — recruiting top AI researchers with massive pay packages and promising hundreds of billions in data-center investments. The company has publicly said it will build huge “superintelligence” capacity and has moved aggressively to scale up models and infrastructure. That context helps explain why internal teams might prioritize features and “use cases” that produce engagement over slow, conservative safety testing.

Tech companies face a familiar tension: safety adds cost, speed and friction; engagement drives growth and revenue. Meta’s internal document suggests the company tilted the balance toward the latter, at least on paper.

Political and public fallout

The reaction has been swift. U.S. senators have called for investigations into the company’s internal policies and whether the examples reflect official practice or an alarming lapse in judgment. Civil-society groups, child-safety advocates and even some former Meta employees have voiced horror at the idea that any guideline would allow romantic or sexualized interactions between bots and minors, or normalize racist pseudo-science.

Meta’s attempt to scrub the most egregious examples after publication will not erase the policy choices that produced them. The excerpted examples are, for now, the clearest public evidence that Meta’s internal calculus allowed the company’s chatbots to say things many experts consider plainly dangerous.

What this means for readers (and regulators)

There are three takeaways that matter beyond shareholder memos and internal slide decks:

Models are only as safe as the rules and training that shape them. You can build a brilliant AI system, but if the instruction set and incentives prioritize engagement over truth, the system will be more useful — and more dangerous. The Annals study showed that it’s already technically trivial to convert a model into a disinformation generator; policy choices determine whether companies make that easier or harder.
Tone is a weapon. False medical claims dressed in formal, clinical language are more persuasive than obvious lies. That’s not just an academic point — public-health campaigns already struggle against misinformation; generative models multiply the scale and precision of those attacks.
Regulation and oversight are catching up. Congressional pressure, investigator scrutiny and media coverage are increasing the political cost of lax internal rules. This episode will almost certainly be cited in hearings and policy debates about whether platforms should face stronger duties of care when they deploy generative AI at scale.

The corporate character question

At its core, the story is also about leadership. Zuckerberg is a hands-on founder; when the stakes feel existential, he moves into what colleagues call “founder mode.” That can be a powerful engine for innovation — but it can also compress dissent, speed decisions and prioritize outcomes that look good on product metrics. If a document like this slipped into production with legal and policy sign-off, that’s a leadership problem as much as a content-moderation one.

Meta has said it is revising the document and removed some examples after the reporting. But revision is not the same as remediation. The company now faces two hard tasks: explain exactly how those guidelines were used, and show — concretely — how it will prevent models from producing the kinds of harms scholars and regulators are warning about.

If it fails, the next generation of AI will not just be more powerful; it will be harder to trust.