HAC Human + AI Collaboration

Anthropic vs. the Pentagon: when ethics commitments meet state power

How the US Department of Defense forced a reckoning with AI safety red lines — and what happened next.

Case ID BC-001
Actor Anthropic / US DoD
Period Feb – Apr 2026
Harm type Militarisation of AI
Overall confidence Verified / Probable
Summary

Anthropic held a $200M Department of Defense contract and had Claude integrated into classified military networks via partners including Palantir. When the Pentagon demanded Anthropic drop objections to autonomous weapons and mass surveillance use — with a Friday 5:01pm deadline — Anthropic refused. The Trump administration ordered federal agencies to stop using Claude, giving the Pentagon a six-month phase-out window. Within days, Claude was reportedly used in US military operations against Iran. OpenAI moved quickly to fill the contract gap.

The case is a controlled test of what AI safety commitments are actually worth when a state actor decides it wants something different. The answer, so far, is: not much — and the accountability mechanisms to make them worth more do not yet exist.

Early 2025

Anthropic signs ~$200M DoD contract

Anthropic enters a defence partnership worth approximately $200M. Claude is integrated into classified US military networks through third-party platforms including Palantir. Use cases include intelligence analysis and operational planning.

Feb 2026

Pentagon issues Friday 5:01pm ultimatum

The Department of Defense gives Anthropic a deadline of Friday 5:01pm to remove objections to Claude being used for autonomous weapons development and mass surveillance operations. Anthropic refuses to comply.

Feb – Mar 2026

Trump orders federal agencies off Claude

Following the standoff, the Trump administration orders most federal agencies to stop using Claude. The Pentagon is given a six-month phase-out window. The government frames this as a "supply chain risk" designation — a procurement mechanism that can restrict contractor access across agencies.

Mar 2026

OpenAI steps in with a DoD agreement

OpenAI moves rapidly to formalise its own DoD partnership. The agreement is publicly described as including guardrails — notably, no autonomous weapons. After criticism about surveillance applications, OpenAI clarifies the deal excludes intelligence agency use cases without additional contract modifications. Critics note the clarifications came after public backlash, not before signing.

Mar – Apr 2026

Claude reportedly used in Iran operations — after the ban

Reporting from the Wall Street Journal and The Guardian describes US military use of Claude for intelligence assessments, target identification, and battle scenario simulation during operations against Iran — occurring in the hours and days after the formal prohibition was in effect. The operational use of Claude continued through existing integrations, despite the executive ban.


Each claim below carries its own confidence rating. The case as a whole is well-documented; specific details vary in their evidentiary basis.

Verified

The Pentagon issued Anthropic a Friday 5:01pm deadline to remove safeguard objections related to autonomous weapons and mass surveillance. Confirmed by Washington Post primary reporting and corroborated by Anthropic's own public statements.

Verified

Anthropic held a contract with the DoD worth approximately $200M, with Claude integrated on classified networks via partners including Palantir. Confirmed by Anthropic's own communications and press reporting.

Verified

The Trump administration ordered most federal agencies to stop using Claude, with the Pentagon given approximately six months to phase out. Confirmed by Al Jazeera and PBS reporting.

Verified

A King's College London study tested AI models across 21 geopolitical conflict scenarios, 329 rounds, generating approximately 780,000 words of analysis. Models chose nuclear signalling or escalatory actions in 95% of scenarios. No model chose capitulation or concession. KCL's own institutional documentation confirms these figures.

Verified

Claude was used by CENTCOM for intelligence assessments, target identification, and battle scenario simulation during US military operations against Iran — reported by the Wall Street Journal and corroborated by The Guardian. This use occurred after the executive ban was in effect.

Probable

OpenAI moved quickly to fill the contract gap left by Anthropic on classified networks, signing a DoD agreement shortly after the standoff escalated. Reported consistently across multiple credible outlets; not confirmed in direct statements from either company at the time of writing.

Probable

A prior flashpoint involving Claude and a Venezuela / Nicolás Maduro-related operation contributed to the dispute. Reported by Washington Post as a precipitating incident; full details not independently confirmed.

Unverified

Some reporting suggested government pressure on Anthropic included threats beyond standard procurement leverage — including references to legal mechanisms that could restrict Claude's use across all contractor environments. Based on single-source accounts; not independently corroborated.


Failure 01

Procurement leverage overrides safety posture

Safety commitments become negotiable under national security framing. A company's ethics red lines are only as strong as its willingness to lose the contract — and its ability to survive doing so.

Failure 02

Policy without enforcement is preference

"No autonomous weapons" is a policy statement. Without contractual definitions, technical enforcement, and independent audit rights, it describes intent — not constraint. The gap between the two is where the harm happens.

Failure 03

Integration creates operational lock-in

Once Claude was embedded in classified workflows via third-party platforms, the ban became easier to ignore than to implement. Operational inertia — not defiance — kept the model running after prohibition.

Failure 04

Accountability gap for external observers

No mechanism exists for independent parties to verify whether AI safeguards in military deployments are contractual, technical, audited, or merely described in a press release. The public has no way to tell the difference.

Failure 05

Market incentive punishes caution

Anthropic's refusal cost it the contract. OpenAI's faster accommodation earned it one. Without shared standards, the procurement market rewards companies that accept more risk — creating a race to the bottom on safety posture.

Failure 06

Human-in-the-loop is insufficient under deadline pressure

The KCL study shows AI models normalise escalation even when humans formally retain decision authority. When AI becomes the first and fastest advisor, and timelines compress, "human oversight" becomes a procedural label rather than a real constraint.


Researchers at King's College London tested multiple AI models across 21 geopolitical conflict scenarios, running 329 simulation rounds and generating approximately 780,000 words of model output for analysis.

Key finding: models chose escalatory or nuclear signalling actions in 95% of scenarios. No model selected concession, capitulation, or de-escalation as a dominant strategy. Researchers noted that deadline dynamics — time pressure applied to decision scenarios — significantly amplified escalatory behaviour.

The implication for this case: Claude was used in active military targeting workflows against Iran. The KCL data suggests AI-assisted decision environments do not produce more cautious outcomes — they produce faster, more escalatory ones, regardless of who nominally holds the final authority.


⚠ Section pending source verification

This section will document the Minab school strike within Operation Epic Fury as a system design failure — specifically: how AI-assisted targeting workflows can produce civilian harm outcomes that are not attributable to a single human decision, but to the architecture of the system itself.

The analytical framing is ready. The section will argue that when AI compresses targeting timelines, aggregates intelligence, and surfaces recommendations, the human who "approves" the strike is approving an AI-structured conclusion — not exercising independent judgment.

Action required: Provide the primary source link (WSJ, Guardian, or equivalent) for the Operation Epic Fury / Minab school strike. Once confirmed, this section will be written to Verified or Probable standard and integrated here.


Hard red lines in contract language, not policy pages. Explicit, legally binding prohibitions on autonomous weapons and surveillance use cases, with definitions specific enough to be enforceable — not aspirational language subject to reinterpretation under pressure.

Independent audit and logging requirements. Mandatory record-keeping of deployment contexts, use cases, and outputs — reviewable by a third party not controlled by either the vendor or the government customer.

Use-case gating with technical enforcement. Controls that restrict what the model can do based on verified deployment context — not controls that rely on the operator self-reporting compliance.

Kill-switch architecture with defined criteria. Conditions under which access is suspended, and a technical mechanism to enforce suspension — rather than a ban that existing integrations can ignore through operational inertia.

Public disclosure of allowed and prohibited use categories. Not the full contract — the categories. Enough for civil society, journalists, and other governments to assess whether stated red lines are being respected.

Shared baseline standards across vendors. So that a company refusing to lower its safety threshold is not simply replaced by one willing to. Without shared floors, the procurement market penalises caution.


This case activates three core BrokenCtrl frameworks:

The Broken Control Loop framework describes exactly what happened here: when state power and contract value outmuscle internal safeguards, the control loop exits the developer entirely. The Policy vs Enforcement framework explains why "no autonomous weapons" in a press release is not the same thing as "no autonomous weapons" in a verifiable deployment. Foreseeable Misuse applies to the integration model itself — once AI is embedded in military targeting infrastructure, escalatory and civilian-harm use cases are not hypothetical.


Washington Post — Anthropic's AI tool Claude central to U.S. campaign in Iran, amid a bitter feud (Apr 2026) Primary
The Guardian — US military reportedly used Claude in Iran strikes despite Trump's ban (Apr 2026) Corroborating
AP News — Pentagon dispute bolsters Anthropic reputation but raises questions about AI readiness in military (Apr 2026) Corroborating
Reuters — AI contract restrictions could threaten military missions, US official says (Apr 2026) Context
Barron's — Anthropic's Battle With President Trump Over AI Weapons Needs to Be 'Resolved Quickly' (Apr 2026) Context
King's College London — Academic study on AI escalation in geopolitical conflict scenarios (2025–26) Primary
Anthropic — Public statements on DoD contract and safety red lines (Feb–Mar 2026) Primary

QUESTIONS

Did the US military use Claude AI despite a ban?

Yes. Reporting from the Wall Street Journal and The Guardian describes US military use of Claude for intelligence assessments, target identification, and battle scenario simulation during operations against Iran — occurring after the Trump administration formally ordered federal agencies to stop using Claude. The operational integrations, embedded via third-party platforms including Palantir on classified networks, continued through existing infrastructure despite the prohibition.

Why did the Pentagon and Anthropic clash over AI safety?

The Pentagon wanted Anthropic to remove restrictions that prevented Claude from being used for autonomous weapons development and mass surveillance operations. Anthropic refused, citing its own safety red lines. The Department of Defense responded with a Friday 5:01pm deadline to comply — Anthropic held its position — and the Trump administration subsequently ordered most agencies to phase out Claude use within six months.

What did OpenAI do after Anthropic's Pentagon dispute?

OpenAI moved quickly to formalise its own Department of Defense agreement after Anthropic was effectively pushed out of the contract. OpenAI stated the deal included restrictions on autonomous weapons use and, following public criticism, clarified that intelligence agency applications would require additional contract modifications. Critics noted these clarifications followed backlash rather than preceding the agreement — raising questions about whether they reflect genuine constraints or post-hoc positioning.

What does the King's College London AI escalation study show?

The KCL study tested AI models across 21 geopolitical conflict scenarios in 329 simulation rounds. Models selected escalatory or nuclear signalling actions in 95% of scenarios, and no model chose concession or de-escalation as a dominant strategy. The study found that deadline pressure — compressed timelines — amplified escalatory behaviour. Applied to this case, the findings suggest that AI-assisted military decision-making does not produce more cautious outcomes even when humans retain nominal final authority.

Are AI safety commitments legally enforceable?

Generally, no — not without explicit contractual language, defined terms, and enforcement mechanisms. Most AI ethics commitments exist as policy documents, not legal obligations. This case illustrates the gap: Anthropic's stated red lines were genuine enough to cost it a $200M contract, but the model was still used operationally after a ban, through existing integrations the vendor had no mechanism to shut down remotely. BrokenCtrl covers this in Framework 04: Policy vs Enforcement.

Last updated: April 2026 · Case ID: BC-001 · Methodology →