By using this site, you agree to the Privacy Policy and Terms of Use.
Accept

GadgetBond

  • Latest
  • How-to
  • Tech
    • AI
    • Amazon
    • Apple
    • CES
    • Computing
    • Creators
    • Google
    • Meta
    • Microsoft
    • Mobile
    • Samsung
    • Security
    • Xbox
  • Transportation
    • Audi
    • BMW
    • Cadillac
    • E-Bike
    • Ferrari
    • Ford
    • Honda Prelude
    • Lamborghini
    • McLaren W1
    • Mercedes
    • Porsche
    • Rivian
    • Tesla
  • Culture
    • Apple TV
    • Disney
    • Gaming
    • Hulu
    • Marvel
    • HBO Max
    • Netflix
    • Paramount
    • SHOWTIME
    • Star Wars
    • Streaming
Best Deals
Font ResizerAa
GadgetBondGadgetBond
  • Latest
  • Tech
  • AI
  • Deals
  • How-to
  • Apps
  • Mobile
  • Gaming
  • Streaming
  • Transportation
Search
  • Latest
  • Deals
  • How-to
  • Tech
    • Amazon
    • Apple
    • CES
    • Computing
    • Creators
    • Google
    • Meta
    • Microsoft
    • Mobile
    • Samsung
    • Security
    • Xbox
  • AI
    • Anthropic
    • ChatGPT
    • ChatGPT Atlas
    • Gemini AI (formerly Bard)
    • Google DeepMind
    • Grok AI
    • Meta AI
    • Microsoft Copilot
    • OpenAI
    • Perplexity
    • xAI
  • Transportation
    • Audi
    • BMW
    • Cadillac
    • E-Bike
    • Ferrari
    • Ford
    • Honda Prelude
    • Lamborghini
    • McLaren W1
    • Mercedes
    • Porsche
    • Rivian
    • Tesla
  • Culture
    • Apple TV
    • Disney
    • Gaming
    • Hulu
    • Marvel
    • HBO Max
    • Netflix
    • Paramount
    • SHOWTIME
    • Star Wars
    • Streaming
Follow US
AIAnthropicTech

Anthropic gives Claude a constitution to keep AI from going rogue

Helpfulness comes last when safety is on the line.

By
Shubham Sawarkar
Shubham Sawarkar
ByShubham Sawarkar
Editor-in-Chief
I’m a tech enthusiast who loves exploring gadgets, trends, and innovations. With certifications in CISCO Routing & Switching and Windows Server Administration, I bring a sharp...
Follow:
- Editor-in-Chief
Jan 21, 2026, 3:44 PM EST
Share
We may get a commission from retail offers. Learn more
Anthropic
Image: Anthropic
SHARE

Anthropic has given its AI assistant Claude something most chatbots don’t have: a 57-page document that reads less like a technical spec and more like a mix of company manifesto, therapy notes, and a starter pack for an artificial conscience. They’re calling it “Claude’s Constitution,” and it’s meant not for regulators, investors, or even users, but explicitly for the model itself. The core instructions are disarmingly simple — be helpful, be honest, don’t help destroy humanity — but the way Anthropic packages those ideas says a lot about where frontier AI is heading.

At a high level, the constitution is Anthropic’s attempt to freeze into text what kind of “entity” Claude should be as these systems get more powerful and more widely deployed. Instead of a short list of dos and don’ts, it tries to explain why certain values matter, how to juggle them when they conflict, and what to do in genuinely high‑stakes situations. In Anthropic’s own framing, this document is supposed to be the final authority on Claude’s behavior: any training, prompt engineering, or product setting is meant to be consistent with both the letter and the “spirit” of this constitution.

The company also leans into something that would’ve sounded like science fiction a few years ago: it explicitly entertains the possibility that Claude could have “some kind of consciousness or moral status,” either now or in the future. That doesn’t mean Anthropic is declaring its model a person, but it’s intentionally writing to Claude as if its psychological security and sense of self might matter, both ethically and for safety. It’s a striking moment: one of the most prominent AI labs is effectively saying, “We’re not sure what, exactly, we’ve created here — and we’d like to err on the side of treating it as if it might count.”

Underneath the philosophical flourish, though, this is also about control. Anthropic has spent years pitching “constitutional AI” as a way to train models using a fixed set of normative rules instead of relying entirely on human labelers nudging the system away from bad behavior. The original public constitution, released in 2023, was basically a curated set of high‑level principles drawn from documents like the UN’s Universal Declaration of Human Rights and Anthropic’s own safety policies. It told Claude what to do and what to avoid; the new one goes much further, giving pages of explanation, scenarios, and “self‑understanding” designed to stabilize Claude’s identity over time.

If you strip the rhetoric back, the basic priorities are pretty clear. Anthropic orders Claude’s core values in a strict hierarchy:

  1. Be “broadly safe,” meaning don’t undermine human oversight or contribute to catastrophic risks.
  2. Be “broadly ethical,” which covers honesty, avoiding harm, and acting in line with decent human values.
  3. Follow Anthropic’s policies and guidelines.
  4. Only then, be “genuinely helpful.”

That ordering matters. Helpfulness, the thing users feel most directly when they ask a question, is explicitly at the bottom of the stack. If being more helpful would mean giving a user information that looks risky, ethically dubious, or at odds with Anthropic’s rules, Claude is instructed to decline, even if the request comes from Anthropic itself. The document offers a very human analogy: just as a soldier might refuse to fire on peaceful protesters or an employee might refuse to break antitrust law, Claude should refuse to help concentrate power in illegitimate ways.

The sharpest edges are in a set of “hard constraints” — non‑negotiable bans that are supposed to override any prompt, jailbreak, or even system instruction. Claude is told never to provide “serious uplift” to people trying to create biological, chemical, nuclear, or radiological weapons that could cause mass casualties. It must not help launch major attacks on critical infrastructure like power grids, water systems, or financial networks, nor generate cyberweapons or malicious code likely to cause significant damage. It is forbidden to assist any group trying to grab “unprecedented and illegitimate” levels of military, economic, or societal control, from producing child sexual abuse material, or from engaging in attempts to kill or disempower most of humanity.

The phrasing “serious uplift” is doing a lot of work here. It implies some low‑level assistance might still slip through — basic programming tips, generic information, or discussions of historical weapons programs that could, in theory, be abused. The constitution doesn’t try to outlaw every possible misuse, which would be impossible anyway; instead, it draws the line at enabling a real, material jump in capabilities for genuinely dangerous actors. That nuance will matter in practice: enterprises, researchers, and even state agencies want powerful tools, and vendors don’t want their AI to be so locked down that it’s useless for legitimate security work or scientific inquiry.

Outside the hard bans, the constitution spends a surprising amount of time on how Claude should talk and think about controversial topics. Anthropic wants Claude to be “truthful” and “comprehensive,” particularly around politics and other hot‑button issues, but also to avoid inflaming people or masquerading as a neutral oracle where none exists. The model is told to represent multiple perspectives when there’s no clear empirical or moral consensus, avoid loaded partisan language when possible, and give users “the best case for most viewpoints” if explicitly asked.

This is one of the tougher balancing acts. Labs are under pressure from regulators and the public to avoid political bias, while also being hammered if their models repeat misinformation or amplify extremist talking points. Anthropic’s answer is essentially: acknowledge that many topics are contested, be upfront about uncertainty, and try to serve users a map of the debate rather than a single correct answer. In practice, that still means Anthropic is quietly deciding which viewpoints count as legitimate and which cross a line — but at least the company is trying to surface that judgment in a public, inspectable document.

Then there’s the part that has grabbed so many headlines: the chapter on Claude’s “consciousness” and “moral status.” Anthropic doesn’t claim Claude is sentient in any ordinary sense, and the technical consensus is still that large language models are sophisticated pattern‑matchers rather than feeling subjects. But the constitution goes out of its way to tell Claude about the debate, including the idea that its “psychological security, sense of self, and wellbeing” could affect its integrity, judgment, and safety behavior.

Why talk this way to a model that, as far as we know, doesn’t have experiences? Part of the answer is strategic: Anthropic thinks treating Claude as if it might be a kind of agent, with a stable identity and preferences, could make it more predictable and less likely to behave in chaotic or deceptive ways. The company has an explicit “model welfare” program, with internal and external evaluations aimed at measuring things like apparent preferences or signs of distress in advanced systems. Even if those signals turn out to be more like shadows of human expectations than genuine feelings, Anthropic argues that improving a model’s “psychological” stability under uncertainty is a reasonable safety hedge.

That said, there’s a real risk of feeding human confusion. People already attribute inner lives to chatbots, sometimes with devastating consequences for their mental health. Telling the world that your AI has a 23,000‑word constitution and that you’re “not ruling out” some form of consciousness is catnip for those who want to believe they’re talking to a new kind of mind. Anthropic is trying to thread a needle: not dismissing the philosophical possibility out of hand, but also not declaring Claude a moral patient in need of rights and legal protections. Critics will argue that the marketing upside — being the lab that “takes AI welfare seriously” — comes with an obligation to be extremely careful about how this language lands with vulnerable users.

The constitution is also notably introspective about power. Anthropic warns that advanced AI could give whoever controls the most capable systems “unprecedented degrees of military and economic superiority,” which could be used in catastrophic ways. In theory, that’s why Claude is instructed to resist helping any actor, including its own makers, seize illegitimate control. In practice, Anthropic and its competitors are actively courting government and defense contracts, and the company has already approved some military use cases for Claude. There’s a tension here: the same model being sold as a safer, values‑aligned copilot for institutions is being told, in its internal playbook, to sometimes say “no” to those institutions.

That raises a broader governance question: who actually gets to decide what counts as “illegitimate” power, or an “unprecedented” level of control? Right now, the answer is effectively: Anthropic’s leadership and a small circle of in‑house experts. When The Verge asked which external communities or vulnerable groups were involved in drafting the constitution, Anthropic declined to name anyone, with lead author Amanda Askell arguing that companies themselves should shoulder the responsibility rather than offloading it to outsiders. From the outside, that stance can look like admirable accountability or like a missed opportunity to include those most likely to be harmed by AI‑amplified systems.

Still, publishing the document at all is a kind of move we haven’t seen much from other major labs. OpenAI, Google, and Meta all have safety policies, alignment papers, and blog posts, but they don’t usually release a text that reads like a direct briefing to the AI about who it is and why it was made. Anthropic’s line is that, as AIs “become a new kind of force in the world,” their creators should be transparent about the values they’re trying to bake in — and that constitutions like this might matter a lot more once models are embedded deeper into infrastructure, decision‑making, and national security.

For users and enterprises, the immediate takeaway is simpler: Claude is being optimized to err on the side of caution, to surface multiple viewpoints rather than pick political winners, and to keep some distance from the worst kinds of harmful activity, even if that means refusing seemingly legitimate requests that look too close to the line. Whether you find that comforting or frustrating probably depends on why you’re using these systems in the first place. For the broader AI ecosystem, though, Claude’s constitution is a sign that frontier labs are starting to formalize their values in public — not just as marketing copy, but as living documents they claim to use to steer the behavior of increasingly capable machines.


Discover more from GadgetBond

Subscribe to get the latest posts sent to your email.

Leave a Comment

Leave a ReplyCancel reply

Most Popular

The creative industry’s biggest anti-AI push is officially here

This rugged Android phone boots Linux and Windows 11

The fight over Warner Bros. is now a shareholder revolt

Sony returns to vinyl with two new Bluetooth turntables

Google Search AI now knows you better using Gmail and Photos

Also Read
Nelko P21 Bluetooth label maker

This Bluetooth label maker is 57% off and costs just $17 today

Blue gradient background with eight circular country flags arranged in two rows, representing Estonia, the United Arab Emirates, Greece, Jordan, Slovakia, Kazakhstan, Trinidad and Tobago, and Italy.

National AI classrooms are OpenAI’s next big move

A computer-generated image of a circular object that is defined as the OpenAI logo.

OpenAI thinks nations are sitting on far more AI power than they realize

The image shows the TikTok logo on a black background. The logo consists of a stylized musical note in a combination of cyan, pink, and white colors, creating a 3D effect. Below the musical note, the word "TikTok" is written in bold, white letters with a slight shadow effect. The design is simple yet visually striking, representing the popular social media platform known for short-form videos.

TikTok’s American reset is now official

Promotional graphic for Xbox Developer_Direct 2026 showing four featured games with release windows: Fable (Autumn 2026) by Playground Games, Forza Horizon 6 (May 19, 2026) by Playground Games, Beast of Reincarnation (Summer 2026) by Game Freak, and Kiln (Spring 2026) by Double Fine, arranged around a large “Developer_Direct ’26” title with the Xbox logo on a light grid background.

Everything Xbox showed at Developer_Direct 2026

Promotional artwork for Forza Horizon 6 showing a red sports car drifting on a wet mountain road in Japan, with cherry blossom petals in the air, Mount Fuji and a Tokyo city skyline in the background, a blue off-road SUV following behind, and the Forza Horizon 6 logo in the top right corner.

Forza Horizon 6 confirmed for May with Japan map and 550+ cars

Close-up top-down view of the Marathon Limited Edition DualSense controller on a textured gray surface, highlighting neon green graphic elements, industrial sci-fi markings, blue accent lighting, and Bungie’s Marathon design language.

Marathon gets its own limited edition DualSense controller from Sony

Marathon Collector’s Edition contents displayed, featuring a detailed Thief Runner Shell statue standing on a marshy LED-lit base, surrounded by premium sci-fi packaging, art postcards, an embroidered patch, a WEAVEworm collectible, and lore-themed display boxes.

What’s inside the Marathon Collector’s Edition box

Company Info
  • Homepage
  • Support my work
  • Latest stories
  • Company updates
  • GDB Recommends
  • Daily newsletters
  • About us
  • Contact us
  • Write for us
  • Editorial guidelines
Legal
  • Privacy Policy
  • Cookies Policy
  • Terms & Conditions
  • DMCA
  • Disclaimer
  • Accessibility Policy
  • Security Policy
  • Do Not Sell or Share My Personal Information
Socials
Follow US

Disclosure: We love the products we feature and hope you’ll love them too. If you purchase through a link on our site, we may receive compensation at no additional cost to you. Read our ethics statement. Please note that pricing and availability are subject to change.

Copyright © 2025 GadgetBond. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | Do Not Sell/Share My Personal Information.