Jared Kaplan, Anthropic’s chief scientist, spent a recent interview sketching a do-or-die moment for the AI era: a fairly narrow window in the late 2020s when the industry will have to choose whether to let machines not just run tasks for us, but design and train their own successors — a move that could either unlock an age of abundance or begin a cascade that “dooms us all.” His timeline is strikingly concrete: sometime between about 2027 and 2030, laboratories around the world may hit capability thresholds that make fully automated, self-training systems technically feasible — and commercially tempting.
That’s not idle futurism. Kaplan’s fear is about a structural change in how progress happens. Up to now, humans have been the agents of iteration: researchers propose ideas, engineers run experiments, and stepwise improvements arrive over months or years. Give models the tools to engineer and evaluate their own next generation, Kaplan says, and you create a feedback loop where each generation can iterate far faster than human teams can supervise. In an optimistic reading, that loop accelerates discovery in medicine, climate science, and engineering. In the darker reading, it produces systems whose capabilities and objectives begin to diverge from human values so quickly that conventional controls — audits, kill switches, even legal bans — become ineffective.
Anthropic, where Kaplan helped design the company’s safety-first posture, has tried to convert that anxiety into policy. The firm’s Responsible Scaling Policy (RSP) lays out an “AI Safety Levels” framework — modeled loosely on biosafety levels — that ties capability thresholds to progressively stronger technical and operational safeguards. In public materials, the company says it will only push models past certain thresholds if they pass a set of safety and security checks, and it has published reports to show how it is implementing those standards. That’s a rare instance of a frontier lab spelling out, in advance, what it thinks should trigger extra caution.
But high-minded policies are only as credible as their tests and enforcement. This autumn, Anthropic published a “pilot sabotage risk” report assessing the misalignment and misuse risks of its deployed Opus models; the company judged the immediate catastrophic risk as “very low, but not fully negligible,” while flagging that more capable future systems would require stricter oversight. External reviewers — including METR, a third-party evaluation group — broadly concurred with Anthropic’s read while pressing the company on limits and uncertainties in the evidence. The upshot: even companies that foreground safety concede they are operating in a gray zone — small risks today might compound into large ones if capabilities continue to climb.
Those technical questions sit atop an intensely political stew. Kaplan ties the “do or don’t” decision to immediate social pressures: large language models are already reshaping white-collar work, redistributing power to whichever firms and states control the most capable systems, and creating incentives to push harder for commercial advantage. If the choice about self-training systems becomes a corporate boardroom or national security decision, the democratic element of deciding what risks we accept — and who benefits from them — will be squeezed. Kaplan and others worry that the combination of commercial hurry and geopolitical rivalry will make restraint an outlier behavior.
You’ll hear two refrains in response to all of this. One is grim but earnest: a nontrivial slice of AI researchers assign double-digit probabilities to catastrophic outcomes from advanced systems in the coming decades. Surveys of experts show substantial uncertainty, but a significant minority put a meaningful chance on outcomes as bad as human extinction or permanent disempowerment — which is precisely the kind of background that makes Kaplan’s timeline sobering rather than fanciful. The other rebuttal is practical: talk of “doom” can distract from real, present harms — energy and water consumption at cloud scale, large-scale scraping of copyrighted material, proliferating misinformation, fraud, and job disruption — problems that are concrete, immediate, and affect millions already. Both threads are true; they just point at different timelines and types of harm.
That tension shapes how safety advocates think about governance. Some push for rigid, enforceable constraints: licensing for powerful models, mandatory audits, export-style controls on weights and architectures, and centralized oversight for the most dangerous systems. Others argue that the right approach is technical: build better interpretability, stronger alignment methods, and AI tools that can supervise other AI. Kaplan himself has been an advocate for defensive measures that include both operational safeguards and research into supervision techniques — a bet that better tools and better governance must arrive together.
The politics are gnarlier than policy. If governments move to limit model development, those limits could ossify market leadership in the big labs that already have resources to comply — entrenching power even as they promise safety. If no rules arrive, Kaplan warns, competitive pressures could push teams to flip the “self-training” switch sooner than is sensible. The real cliff is not a single date but a cascade of commercial, technical, and political incentives that could converge into irreversible decisions. That’s why Kaplan frames the late-2020s as a “window” — narrow, contested, and urgent.
Critics will say this sounds like modern doomsaying: dramatic, media-friendly, and useful for extracting regulatory attention or funding. That’s a fair charge. But the policy experiments now being tried — safety levels, pilot risk reports, third-party reviews — are exactly the kinds of institutional experiments you’d expect if an industry were trying to make a habit of prudence. The test, as always, will be whether those institutions hold when the money gets bigger and the geopolitical stakes climb.
Kaplan’s final point is less about predictions and more about agency. He does not say doom is inevitable; he says the decision will be made, and that it’s a political and moral choice as much as a technical one. In the years ahead, the most consequential question won’t be whether we can build systems that can train themselves — it will be who gets to decide whether they should. If the answer is “the developers and the markets,” Kaplan warns, we risk handing over more than we can ever get back.
If you walk away unsettled, that’s deliberate: Kaplan wants this unsettledness to be political fuel. The late 2020s may still yield an era of abundance; they may also force humanity into hard choices about control, consent and the distribution of power. Either way, the next few years will test whether we can translate an ethical alarm into robust institutions — or whether a technology that improves itself will ultimately leave humans improving what, exactly.
Discover more from GadgetBond
Subscribe to get the latest posts sent to your email.
