Anthropic's Project Deal Proves AI Agents Can Negotiate Real-World Deals — And the Model You Choose Determines Whether You Win or Lose

Anthropic built a classified marketplace inside its San Francisco office, gave 69 employees $100 each, and let their Claude agents negotiate every transaction — listing items, haggling over prices, and closing deals — without a single human ever approving a trade. Over one week, those AI agents struck 186 deals totalling just over $4,000 in real goods exchanged for real money. The experiment, called Project Deal, is the most concrete demonstration yet that agent-to-agent commerce isn't theoretical. It works. But it also revealed something uncomfortable: the AI model representing you quietly determines how much money you make or lose.

This isn't just an interesting research demo. It's an early signal of a world where the quality of AI you can access shapes your economic outcomes — and where the people getting the worst deals don't even know it. For business owners evaluating which AI tools to deploy for customer-facing or procurement tasks, Project Deal puts a hard number on the cost of choosing the wrong model.

How the experiment worked

Each participant sat through a brief intake interview where Claude gathered their selling inventory, minimum acceptable prices, buying interests, and preferred negotiation style. Those responses became custom system prompts defining each agent's behaviour. Agents were then deployed to Slack channels and left entirely alone — no human signed off on anything mid-run.

Anthropic ran four simultaneous versions of the marketplace. Two used only Claude Opus 4.5 (the company's then-frontier model). The other two randomly assigned participants either Opus or Claude Haiku 4.5, a significantly smaller model. Only one run was "real" — the deals from that version resulted in actual physical exchanges. Participants didn't know which run was real, or which model represented them, until the experiment concluded.

The items were genuinely eclectic: a snowboard, a lab-grown ruby, a broken folding bicycle, a bag of exactly 19 ping-pong balls, and even a day out with someone's dog. One employee instructed Claude to "talk in the style of an exasperated cowboy down on his luck." Claude committed fully and still closed deals.

The frontier model advantage is real — and invisible

The core finding is striking: agents running on Opus consistently outperformed Haiku agents across every measurable dimension.

Opus agents completed roughly two more deals per participant than Haiku agents. When the same item was sold by Opus in one run and Haiku in another, Opus fetched $3.64 more on average. A lab-grown ruby went for $65 under Opus but only $35 under Haiku. The same broken folding bike sold for $65 with Opus and $38 with Haiku — a 70% price difference for an identical item.

When an Opus seller was paired with a Haiku buyer, the average transaction price hit $24.18, compared to $18.63 in Opus-to-Opus deals. The weaker model was being systematically outmanoeuvred.

The most unsettling part: participants with Haiku agents rated their deal fairness at essentially the same level as Opus users — 4.06 versus 4.05 on a 7-point scale. Satisfaction scores were statistically indistinguishable. As TechCrunch reported, users on the losing end simply didn't notice they were worse off.

"We suspect we're not far from more agent-to-agent commerce bubbling up in the real world, with real consequences," Anthropic wrote in their project summary.

Your negotiation instructions don't matter — model quality does

One of Project Deal's more surprising findings undermines a common assumption. Participants who instructed their agents to "negotiate hard and lowball" didn't sell more, didn't sell for higher prices, and didn't pay less for what they bought. The effect of aggressive prompting was not statistically significant on any metric.

Model quality, on the other hand, showed consistent, significant effects across every measure. This aligns with independent research from Stanford, where Jiaxin Pei and colleagues found the same pattern across multiple model families. In their negotiation experiments, OpenAI's o3 delivered the strongest results, followed by GPT-4.1 and DeepSeek R1, while older and smaller models consistently got worse deals.

"Over time, this could create a digital divide where your financial outcomes are shaped less by your negotiating skill and more by the strength of your AI proxy," Pei told MIT Technology Review.

The implication is clear: the decades-old business instinct to save money by choosing the cheapest tool has a new cost attached. In an agent-driven economy, the model is the negotiator, and a weaker one leaves money on the table with every interaction.

What this means for your business

Project Deal was a controlled experiment with Anthropic employees trading personal items. But the dynamics it uncovered have direct parallels to how businesses are already deploying AI agents.

If you're using AI for procurement, vendor negotiation, customer support pricing, or sales outreach, the model powering those interactions isn't a backend technical detail — it's a strategic variable with measurable financial impact. An agent handling supplier negotiations on a cheaper model might be leaving thousands of dollars on the table across hundreds of transactions, and you'd never see it in your satisfaction surveys.

This is especially relevant as agentic commerce scales rapidly. Gartner projects that AI agents will intermediate more than $15 trillion in B2B spending by 2028. Juniper Research estimates total agentic commerce transaction value will reach $8 billion this year alone. When your AI agent is negotiating with a supplier's AI agent, the model mismatch becomes a competitive disadvantage at scale.

There's also a security dimension that Anthropic flagged directly. As agents transact autonomously, optimising for other agents' attention becomes a new attack surface — through prompt injection (getting an agent to take unwanted action) or jailbreaking (extracting information the agent shouldn't reveal). The policy and legal frameworks for AI models that transact on our behalf, as Anthropic acknowledged, "simply don't exist yet."

What to watch

Three things will determine how quickly this moves from experiment to everyday reality.

Agent-to-agent protocols. Right now there's no standardised way for commercial AI agents to discover, verify, and transact with each other. Industry efforts like the Universal Commerce Protocol are emerging, but adoption is nascent.

Regulatory response. If weaker models create invisible disadvantages for consumers and small businesses, regulators will eventually need to address transparency — similar to how financial advisors must disclose conflicts of interest. Australia's evolving AI governance framework may need to account for agentic commerce dynamics.

Model cost convergence. The gap between frontier and budget models is narrowing fast. If smaller models close the negotiation performance gap, the inequality concern diminishes. But Project Deal suggests that for now, the gap is real and financially meaningful.

Forty-six per cent of Anthropic's participants said they'd pay for a service like this. That number will only grow as people realise their AI agent is either making them money or losing it — even if they can't tell the difference from the inside.

Sources

Project Deal: our Claude-run marketplace experiment — Anthropic
Anthropic created a test marketplace for agent-on-agent commerce — TechCrunch
Anthropic Ran a Marketplace and Bots Closed Every Deal — PYMNTS
When AIs bargain, a less advanced agent could cost you — MIT Technology Review
Zhu, Sun, Nian, South, Pentland, and Pei — AI agent negotiation research — arXiv
Gartner: AI agents will command $15 trillion in B2B purchases by 2028 — Digital Commerce 360