The Verification Bottleneck: Why AI's Real Cost Is Human Attention

AI scales execution to near-zero cost. But verifying that output stays biologically bounded. The bottleneck was never intelligence. It's human verification bandwidth.

A new paper from MIT and Washington University frames the AI transition as two cost curves racing in opposite directions. The cost to automate falls exponentially. The cost to verify stays where it’s always been: bounded by human cognition.

The binding constraint on growth is no longer intelligence. It’s human verification bandwidth.

Paper excerpt from "Some Simple Economics of AGI" by Catalini, Hui, and Wu

The asymmetry

AI can generate a 50,000-line application in a day. A design document in minutes. A legal brief in seconds. The marginal cost of execution approaches zero.

But someone still has to check whether the output is correct. Whether the code handles edge cases the model didn’t think about. Whether the legal citations actually exist. Whether the billing logic accounts for the dedicated carrier program that nobody wrote down.

That checking happens at human speed. Reading speed. Context-building speed. Domain expertise speed. None of these scale with compute.

The Faros AI study across 10,000+ developers found the numbers: teams using AI complete 21% more tasks and merge 98% more PRs. But PR review time increases 91%. PRs are 154% larger. Bug rates go up 9% per developer. The work moved from writing to reviewing.

Trust is the real currency

There’s an interesting argument buried in the data here: the verification problem is really a trust problem wearing a technical hat. You trust a colleague to run a project because you’ve seen their judgment, you’ve worked with them, you’ve calibrated over years. A doctor trusts a resident’s diagnosis after watching hundreds of cases together. A manager trusts a direct report’s analysis after months of seeing how they think. That trust was expensive to build. It doesn’t transfer to a model that hallucinates between 0.7% and 94% of the time depending on who built it.

The Stack Overflow 2025 survey puts numbers to this: 84% of developers use AI tools, but only 33% trust the output. A 51-point gap between adoption and confidence. The more people use AI, the less they trust it. Experienced developers trust it least.

And here’s the part that doesn’t get enough attention: # of humans << # of AIs. The number of agents producing output grows with compute. The number of humans available to verify that output stays fixed. Every new agent, every new workflow, every new automation draws down the same finite pool of human attention.

This framing came up when I brought up the topic with a colleague at work — thank you Francisco Arceo for the conversation that sharpened the thinking here.

The measurability gap

The paper introduces a concept called the “measurability gap.” Tasks that are quantifiable get automated first. What’s left are the tasks that require judgment, context, and liability: what the authors call n-hard or n-legal processes.

The dangerous part isn’t that AI produces wrong answers. It’s that the wrong answers look right. CIO.com described it well: “Almost-right code is insidious. It compiles. It runs. It passes the basic unit tests. But it contains subtle logical flaws or edge-case failures that aren’t immediately obvious.” Finding the omission in 100 lines of AI-generated code is harder than writing the 100 lines yourself.

The METR randomized controlled trial found experienced developers using AI tools took 19% longer than without AI. Before the study, they predicted AI would make them 24% faster. After experiencing the slowdown, they still believed it had sped them up by 20%. A 39-point perception gap.

The HBR study from UC Berkeley tracked 40 workers over 8 months and found the same pattern: in micro-moments, people described momentum. When they stepped back, they described feeling busier, more stretched, less able to disconnect. 62% of associates reported burnout. AI didn’t reduce work. It intensified it.

Hollow economy vs. augmented economy

The paper’s central warning: without verification infrastructure, the market drifts toward a “Hollow Economy.” Explosive measured activity, fundamentally hollowed-out human control. GDP goes up. Understanding goes down.

The alternative they describe is an “Augmented Economy” where verification scales alongside automation. This means treating verification as a primary production technology, not a compliance checkbox. Cryptographic provenance, liability underwriting, evaluation records, audit trails. The ability to insure outcomes, not just generate them.

One failure mode stands out: the expertise decay loop. Routine tasks automate, entry-level positions disappear. Those positions were the training ground for future expert verifiers. The system gradually undermines its own capacity to check itself.

Where this leaves us

The paper’s framing: intent → execution → verification. Humans set intent. Machines execute. Humans verify and underwrite responsibility.

Compute gets cheaper every quarter. Human attention does not. Every organization deploying AI at scale is finding this out firsthand. Not because they read a paper, but because their review queues are backing up, their senior engineers spend more time reading generated code than writing their own, and their confidence in what shipped last Tuesday is lower than it was a year ago.

The cost of AI productivity isn’t measured in compute. It’s measured in the attention of the people who still have to decide whether the output is worth trusting.