When your AI shopping agent screws up, who gets the bill?

And who should?

Oct 30, 2025

It’s the end of 2025, and AI agents (previously limited to therapist, oracle, and haiku-generation duties) have learned how to spend money. And it’s likely that, at least in the beginning, they will not spend it particularly wisely.

Imagine this: After one watery conference room coffee too many, you have reached your breaking point. You fire up your favorite AI agent and issue a fateful prompt: “buy me the strongest coffee you can find.”

The following week, twelve test tubes of industrial-strength coffee extract arrive at your door, festooned with angry-looking warning stickers and marked “for laboratory use only”. Oh, and the bill is $2,195. No refunds.

Who is on the hook?

The first agentic commerce holiday season is underway. 70% of consumers are “at least somewhat” comfortable having AI agents make purchases on their behalf. Merchants face a wave of disputes the payment system wasn’t designed to handle.

Call it the SNADpocalypse: an unprecedented flood of “significantly not as described” claims from buyers blaming merchants for the actions of their own shopping agents.

This isn’t handwringing. ChatGPT Instant Checkout launched September 29. Google’s AP2 protocol has 60+ industry partners. Visa reports a 4,700% surge in AI-driven traffic to U.S. retail sites in October 2025. The infrastructure exists. The volume is spiking. The dispute frameworks are 25 years old.

Yeah, something’s going to break.

What is agentic commerce?

Agentic commerce describes transactions where AI systems act autonomously on your behalf to discover, evaluate, and purchase goods or services — with varying degrees of human oversight.

You can think of shopping assistants as falling into similar “levels” as those we apply to assistive driving technology, from lane change alerts to sleep-in-the-backseat self-driving capabilities. These autonomy levels map roughly to the standard marketing funnel stages of awareness, consideration, conversion, and retention.

Level 0 (~2000–2015): Non-Autonomous. Search engines and voice assistants enable product search but not purchase. You ask, they show, you buy manually.

Level 1 (2016-2020): Awareness. The era of The Algorithm. Instagram ads that know you better than your husband does. AI recommendation engines across search and social personalize shopping, but humans still click “buy.” And boy do they click buy.

Level 2 (2021-2024): Consideration. Large language models hooked up to the web enable conversational product comparison. You can ask complex questions, get reasonable-sounding answers — but it’s not optimized for commerce yet. A shopping query is treated the same as a request to translate your apartment lease into pirate language. How innocent we were.

Level 3 (2025): Conversion. AI executes purchases within pre-defined parameters, humans set boundaries. This is what Stripe’s Agentic Commerce Protocol in ChatGPT’s Instant Checkout has unlocked, with in-chat checkout on a per-transaction approval basis. You talk to your agent, it presents the checkout screen, and you click yes or no.

Level 4: (2026) Retention. AI makes routine purchases independently based on standing mandates, humans intervene only for exceptions. Google’s AP2 with cryptographic mandates is moving in this direction. You set the rules once, the agent buys autonomously within them. (Questions abound here about what criteria agents will use to make buying decisions, and how marketers will be able to influence them).

Level 5 (2028?): Full autonomy/commerce singularity/run for your lives. AI manages entire purchasing lifecycle including dispute resolution. Maybe even talking to “seller agents” instead of static vendor systems. Nobody’s here yet. Maybe nobody should be.

We’re transitioning from Level 2 to Level 4 right now. And that transition has thrown up some fog of war over who is liable when things go wrong.

The 2025 holiday season is set to be the first meaningful test of agent-led checkout at scale. E-commerce took 10-15 years to develop mature dispute resolution frameworks. Agentic commerce players are attempting to build equivalent infrastructure in 1-2 years while transaction volumes surge.

The image that comes to mind is a cement truck pouring runway for a plane that has already begun taxiing. So, are we going to get our proverbial wheels stuck in the proverbial wet concrete?

The authorization gap nobody solved

Traditional chargeback frameworks assume a human made the purchase decision. You saw the product. You clicked buy. You entered your card details. If the product didn’t match, that’s the merchant’s problem — the product is “significantly not as described” (SNAD).

Agentic commerce breaks that model.

You told your agent to buy “coffee.” It bought a dozen vials of coffee-derived lab reagent that could stop your heart as surely as the cyanide it was stored next to in the warehouse. Agent error? Merchant liability? Who misunderstood whom?

You authorized an agent generally — ”handle my grocery shopping” — but did you authorize this specific $300 purchase from this specific grocery store at this specific moment? And what if the potato chips aren’t crinkle-cut the way you like them?

The agent hallucinated product features that don’t exist. Is that the AI company’s fault for the hallucination? The merchant’s fault under “not as described” provisions? Your responsibility for trusting an unreliable agent? Eliezer Yudkowsky’s fault, somehow?

Traditional chargeback frameworks assume a human made the purchase decision. When an AI agent operates autonomously — potentially completing transactions before human awareness even kicks in — the existing rules break down.

Liability remains unclear when AI agents complete transactions. Merchants may bear fraud and dispute costs despite shoppers never even visiting their websites.

We don’t have legal answers. But we do have three protocols racing to create technical solutions before regulatory frameworks catch up.

Three protocols to manage — and maybe prevent — the coming chaos

The payments industry is currently moving in the right direction, building dispute prevention infrastructure before crisis forces their collective hand. A few protocols have emerged, championed by Google, OpenAI, Visa, Stripe, PayPal, and others, seeking to establish a set of ground rules for agentic commerce.

Stripe ACP: Real-time explicit consent

Stripe’s Agentic Commerce Protocol launched with OpenAI in September. In this model, when the agent wants to complete a purchase, it presents an inline checkout interface showing product, price, seller, and shipping details. You explicitly approve this specific transaction. Merchants verify authorization through the Delegated Payments spec which, as an open protocol, does not rely on Stripe powering your payments.

Strength: clear moment of human consent for this purchase.

Weakness: doesn’t scale to high-frequency, low-value autonomous transactions. If you want the agent to buy groceries every week without asking, this breaks down.

Best for: Level 2 autonomy — supervised agent purchases where you approve each transaction.

Google AP2: Cryptographic mandates with bounded authority

Google’s AP2 protocol (announced September 16, 60+ partners in early pilots) takes a different approach. You sign a cryptographically verifiable “mandate” defining spending limits, merchant categories, and time restrictions. Example: $50 per transaction, $500 per month, groceries and household goods only. The agent operates within those boundaries autonomously.

Strength: scales to recurring, routine purchases while maintaining authorization proof.

Weakness: “misinterpretation” disputes remain ambiguous. The agent stayed within your $50 limit but bought the wrong product. Is that significantly not as described? The mandate doesn’t cover that. Requires sophisticated mandate management.

Best for: Level 3 autonomy — delegated authority with financial guardrails.

Visa TAP: Merchant-side verification

Visa’s Trusted Agent Protocol (announced October 14; documentation on Visa Developer and GitHub) addresses a different problem: merchants can’t tell legitimate AI agents from malicious bots. With AI-driven traffic surging 4,700% year-over-year, bot fraud is exploding alongside legitimate agent purchases.

TAP helps merchants distinguish legitimate AI agents from malicious bots through agent identity attestation and verification protocols. Merchants can distinguish ChatGPT making a purchase from a bot pretending to be ChatGPT.

Strength: scalably addresses the surge in AI-driven retail traffic; prevents bot fraud masquerading as agent purchases.

Weakness: doesn’t directly address human intent verification. Even if you verify the agent’s identity, you still don’t know if the human actually authorized this specific action.

Best for: complementary infrastructure securing the merchant endpoint.

All three protocols flip the dispute model from post-transaction resolution to pre-transaction authorization proof. But will that be enough to tie the intent of a transaction back to a properly-informed consumer?

The identity layer that might save merchants

The most interesting development to me isn’t the payment protocols — it’s the identity verification layer emerging beneath them.

Prove Verified Agent (launched October 2025): Creates an “end-to-end chain of custody” linking verified identity, human intent, payment credentials, and consent — all backed by cryptographic proof. Integrates with Visa’s payment infrastructure.

Visa Payment Passkey (live in Middle East, expanding globally): FIDO2 biometric authentication replacing passwords and OTPs. Uses fingerprint, facial recognition, or device PIN. Already processing real transactions through noon payments

PayPal Wallet Integration (announced October 2025): Leverages PayPal’s preexisting risk models and OTP/biometric verification of its vast customer base to provide validated payment intents in ChatGPT via the ACP protocol.

The innovation: non-repudiable proof that a specific human authorized a specific agent action at a specific moment — before the transaction occurs.

Think chip-and-PIN for agent purchases. The merchant can prove the person who authorized the purchase was actually present (or at least their fingers and/or face were) at the moment of authorization. Barring a Hannibal-Lecter-style scenario, this is strong evidence of human intent.

This addresses what mandates alone cannot: proof that the human who created the mandate is the same human authorizing this transaction right now.

Why merchants will probably be left holding the bag anyway

The Fair Credit Billing Act of 1974 limits consumer liability for unauthorized credit card use to $50. It provides chargeback rights for “billing errors” or “goods not as described.”

Does an agent’s misinterpretation constitute a billing error? Courts will decide. But history suggests consumer-favorable interpretation — not least because the agents representing the consumer are the spawn of some of the most politically powerful and influential companies since the American railroad era.

Merchants already lost this fight once

When e-commerce exploded in the late 1990s, the payment industry faced a trust crisis: card-not-present transactions meant merchants couldn’t verify the card holder’s physical presence or confirm the person using it actually owned it.

The solution heavily favored consumers. The Fair Credit Billing Act gave cardholders dispute rights. Merchants bore 100% of CNP fraud liability. If a stolen card was used online, merchants lost both merchandise and the transaction fee.

That framework still exists today. It’s about to apply to agent purchases.

The problem is worse this time

Early e-commerce fraud was binary: either you entered your card details or you didn’t. Either your card was stolen or it wasn’t.

Agentic commerce creates a spectrum of disputes. The agent misunderstood. The agent hallucinated. The agent operated within your mandate but bought the wrong thing. The agent acted faster than you could intervene.

Existing consumer protection frameworks weren’t designed for agent-mediated transactions. The question is whether they’ll adapt through litigation or proactive regulation.

Current legal precedents and ambiguity

Fair Credit Billing Act (1974): As mentioned above, limits consumer liability for unauthorized credit card use to $50 and provides chargeback rights for “billing errors” or “goods not as described.” Does an agent’s misinterpretation constitute a billing error? Courts will decide.

UETA and ESIGN (widely adopted) already recognize contracts formed by electronic agents without human review of each action. But how those principles interact with FCBA chargeback rights in agent-mediated purchases remains legally untested.

GDPR and consent: EU frameworks require explicit, informed consent. Reserve Bank of India (RBI) rules put specific constraints on e-mandates that have been the bugbear of SaaS businesses trying to set up recurring billing for thier Indian customers. Can a standing mandate satisfy this for future autonomous transactions? Nobody knows.

Emerging frameworks to watch

eIDAS 2.0 (EU): Creates framework for digital identity wallets in Europe. Could establish legal groundwork for cryptographic mandates representing delegated authority.

AI Accountability Acts (various US states): Emerging legislation defining liability for AI decision-making. May establish precedents for agent-mediated purchases.

Payment network rule updates: Visa and Mastercard are updating operating regulations to address agent transactions. These changes may establish de facto standards before regulation catches up.

The precedent problem: Early disputes will establish case law. Merchants should assume consumer-favorable interpretation until proven otherwise. That’s how CNP disputes evolved.

What you can do right now

Waiting for regulatory clarity is a losing strategy. History suggests merchants will bear the brunt of the early disputes that will hit this holiday season.

It’s worth taking stock of where you land on a few key dimensions of agentic commerce readiness: visibility, verification, variance, vigilance, and voice (yes, I am the cutest, thank you for saying so!)

Visibility: Know when agents are buying

What’s happening: Most merchants can’t distinguish agent purchases from human purchases.

Where we want to be: Real-time identification of agent-mediated transactions with agent provider attribution.

What you can do:

Require disclosure when purchases are agent-driven versus human-driven (push for this in Stripe ACP and Google AP2 adoption agreements)
Track which AI agents are purchasing (ChatGPT, Perplexity, future entrants)
Monitor agent traffic patterns using Visa TAP or similar verification protocols
Create separate transaction codes for agent purchases in your payment processing

Why it matters: Don’t go into this blind. Dispute patterns will differ between human and agent purchases. Start measuring now to anticipate impact and required investments.

Verification: Prove authorization chains

What’s happening: Merchant has no visibility into whether human authorized an agent action.

Where we want to be: Cryptographic proof of authorization accessible during dispute resolution.

What you can do:

Integrate Stripe’s Shared Payment Token API (ACP) or AP2 mandate verification
Implement Prove Verified Agent or similar identity verification at checkout
Store authorization artifacts (tokens, mandate signatures, biometric authentication logs) with transaction records
Create audit trails showing what product information the agent accessed before purchase

Why it matters: Burden of proof falls on merchants. Authorization chain evidence is your defense against SNAD disputes.

Variance: Design for misinterpretation

What’s happening: Product descriptions and policies assume human comprehension.

Where we want to be: Machine-readable specifications that reduce agent misinterpretation.

What you can do:

Implement structured data (Schema.org markup) for product specifications
Provide clear, unambiguous attribute definitions (dimensions, materials, compatibility)
Use AI to test how your product descriptions might be misinterpreted by agents
Create explicit compatibility/incompatibility statements (”requires X,” “does not include Y”)

Why it matters: “Your website confused the agent” will be a common dispute trigger. Clear specifications are your first line of defense.

Vigilance: Adapt existing fraud detection

What’s happening: Fraud systems are optimized for human purchasing patterns.

Where we want to be: Agentic commerce-aware fraud detection distinguishing legitimate agents from fraud.

What you can do:

Train fraud teams to recognize agentic commerce patterns (velocity, basket composition, time-of-day)
Distinguish legitimate agent purchases from bot fraud using Visa TAP or similar
Update velocity rules — agents may make multiple purchases rapidly across merchants
Monitor for “shopping cart abandonment” patterns that differ from human behavior

Why it matters: Traditional fraud signals may not apply. Legitimate agent behavior can look like bot activity.

Voice: Participate in standards development

What’s happening: Payment networks and platforms defining rules without broad merchant input.

Where we want to be: Merchant interests represented in emerging protocol standards and liability frameworks.

What you can do:

Join Stripe ACP, Google AP2, and Visa TAP working groups or feedback programs
Engage with merchant associations on agentic commerce policy positions
Negotiate clear liability allocation with agent platforms in early adoption agreements
Document early dispute patterns and share insights with networks to inform rule development

Why it matters: The rules being written now will govern disputes for the next decade. Merchant silence means merchant liability by default.

Self-assessment checklist

Use this to evaluate your current agentic commerce readiness:

Visibility

Can you identify which transactions were agent-mediated versus human?
Do you know which AI agents are purchasing from you?
Can you track agent traffic patterns, ideally in real-time?

Verification

Can you access authorization proof during disputes (tokens, mandates, biometric logs)?
Do you store agent authorization artifacts with transaction records?
Have you integrated any identity verification layer (Prove, Visa Passkey, etc.)?

Variance

Are your product descriptions machine-readable (structured data)?
Do you keep auditable change logs of product descriptions in your catalog system so you can prove what your site content was on a given date?
Have you tested product descriptions for agent misinterpretation risk?
Do you explicitly state compatibility/requirements/exclusions?

Vigilance

Have you updated fraud detection rules for agent purchasing patterns?
Can your systems distinguish legitimate agents from bot fraud?
Do you have separate protocols for reviewing agent transactions (e.g. Visa TAP guidance)?

Voice

Are you participating in protocol working groups or merchant coalitions?
Have you negotiated liability terms in early agent commerce agreements?
Are you documenting dispute patterns to inform future standards?

Three scenarios for what happens next

Optimistic: Cryptographic mandates and biometric verification prevent most disputes. Clear authorization chains resolve the rest quickly. The SNADpocalypse never materializes.

Pessimistic: Mandates prove inadequate for “misinterpretation” disputes. Courts rule in favor of consumers citing FCBA protections. Merchants face another decade of disputes while the system evolves.

Realistic: Some disputes prevented. Some handled better. Some create precedents that reshape liability frameworks. Not a catastrophic collapse, but not a seamless transition either.

We don’t know which will play out, but the industry’s awareness of the problem is far ahead of where e-commerce was in 1998. Major payment networks have deployed initial standards before the critical mass of adoption has even hit. Identity verification infrastructure is launching in parallel with transaction protocols.

But awareness doesn’t shift liability.

The Fair Credit Billing Act still favors consumers. Payment network rules still place burden of proof on merchants. Early adopters will bear costs while precedents form and infrastructure matures.

Merchants should assume they’ll hold the bag until proven otherwise. That assumption guided survival in early e-commerce. It’s the smart bet now.

Questions merchants will be asking in the coming months

Can I refuse to accept agent-powered purchases?

Technically yes, but practically difficult. Without integrating human-in-the-loop identity verification, you can’t easily distinguish agent purchases from human purchases at the point of transaction. By the time you identify an agent purchase, payment networks already processed it, meaning you’re looking at a refund, not a refusal.

If I implement Stripe ACP or Google AP2, does that protect me from disputes

Partially. These protocols provide better authorization proof than current CNP transactions, but they don’t eliminate disputes. “Significantly not as described” claims can still arise if the agent misinterpreted product specifications or user intent. The protocols improve your defense — particularly around proving the cardholder authorized the transaction — but they don’t guarantee victory on product-related disputes.

Who’s liable when an agent “hallucinates” product features that don’t exist?

Legally unclear. Arguments exist for agent provider liability (their system made the false claim), merchant liability (under FCBA “not as described” provisions), or consumer responsibility (they authorized the agent). Early case law will determine precedent. Document everything about what product information you provided.

Should I create separate return policies for agent purchases?

Consider it, but consult a lawyer. Separate policies could help address “misunderstanding” scenarios distinct from fraud. However, payment networks may not recognize such distinctions, and consumer protection laws may override them. This is evolving territory.

What happens if a customer’s agent gets “hacked” and makes fraudulent purchases?

Likely treated as standard CNP fraud under current frameworks — merchant liable unless they can prove authorization. This is precisely why identity verification layers (Prove Verified Agent, Visa Payment Passkey) matter — they create stronger evidence the legitimate account holder authorized the transaction.

How do I know if I should join protocol working groups or wait?

If you’re a large merchant or early adopter of agent commerce capabilities, participate now — rules are being written. If you’re small or not yet seeing agent traffic, monitor closely but focus on the 5 V’s of agentic commerce readiness discussed above. You don’t need to join every working group, but you should have someone tracking developments.

Will agentic commerce disputes be handled differently from regular chargebacks

Not yet. Existing chargeback reason codes and processes apply until payment networks create specific agent transaction categories. This is why early disputes will likely favor consumers — the frameworks default to consumer protection in ambiguous scenarios.

What’s the single most important thing I can do right now?

Visibility. Start tracking which transactions are agent-mediated versus human. You can’t manage disputes you can’t identify. Everything else in the readiness framework depends on this foundation.

Discussion about this post

Ready for more?