Part I
Why AI has a speed limit

Start with the math

To understand where AI is going, you have to understand one thing about how it works.

Modern AI processes language by comparing every piece of input against every other piece. That's where context, nuance, and meaning come from: every piece checked against every other piece.

The problem is mathematical. If the model is working with N pieces of input, full comparison requires N-squared operations, meaning the number of inputs multiplied by itself.[1] Double the input, quadruple the work. Ten times the input, one hundred times the work. This isn't an engineering limitation. It's arithmetic.

Quadratic scaling: why longer inputs hit a wall
Relative compute cost as input length grows
1K
2K
16×
4K
64×
8K
256×
16K
1,024×
32K

This matters because it's a different kind of constraint than most people assume. Not a problem waiting to be solved, but a boundary baked into the operation itself.


A ceiling, not a wall

To see why, it helps to compare it to a constraint most people already trust. Think about Bitcoin. The protocol has never been hacked. Most people treat its encryption as unbreakable. But technically, it rests on math problems we believe are hard but haven't proven impossible to crack. A breakthrough could theoretically appear. It's just very unlikely.

The attention math is a different kind of constraint. It's not a lock that might someday be picked. If you want to compare every piece of input to every other piece, you have to visit every pair. That's not an unsolved problem. It's the definition of the operation. (The formal proof that you can't do it faster rests on an unproven mathematical conjecture,[1] but no one has found a way around it.)

The obvious counterargument: maybe you don't need all the comparisons. Maybe you can skip most of them and still get good enough results. The field is betting heavily on this. Some progress is real. Engineers have found ways to do the exact same math using far less memory, which makes AI faster and cheaper to run. That's genuine progress. But it doesn't change the number of comparisons. It does the same work more efficiently. It doesn't reduce the work.

The approaches that actually try to skip comparisons, or replace the comparison system with something structurally different, do lose something. They consistently underperform when you need the model to recall specific details buried in long inputs. The amount of text these models can accept has grown 1,000× since 2019, but on tasks that test whether the model actually uses all that text, effective performance is only 50–65% of what's advertised, and this varies significantly by model.[2][3] The gap is narrowing, but nobody has closed it. It may be closeable. But right now, every method that genuinely reduces the comparison count trades away some accuracy to do it.

The industry knows this. And if you watch what they're building, you can see them working around it.


What the architecture shifts actually tell you

Every major AI architecture shift of the last few years is an admission that full attention doesn't scale.

The workarounds all follow the same pattern: skip some comparisons, approximate others, or replace the comparison mechanism with something cheaper. These aren't solutions to the core math problem. They're trades. You sacrifice fidelity and hope that more training data compensates for what was lost.

For routine tasks, the shortcuts are often good enough. But the gap shows up exactly where it matters most: when you need precise recall of specific details buried in long inputs, the approximate methods fall short. That's not a coincidence. It's what you'd expect when the thing you're skipping is the thing that was doing the work.

None of this stays theoretical once you try to deploy it. The math problem becomes a business problem fast.


The real bottleneck: implementing it

This is where the math ceiling becomes a business ceiling.

Real business problems don't fit neatly into what a model can look at in one pass. A company's contracts, emails, financial records, internal docs: all of it together is orders of magnitude larger than a model's input capacity. So you have to chunk it. Break the information into pieces, decide which pieces to feed the model, and hope you picked the right ones.

That chunking process requires its own infrastructure, its own strategy, and often its own people. You need systems to split, index, and retrieve the right chunks. You need someone who understands the business well enough to design that pipeline. You need monitoring to catch when the system retrieves the wrong context and gives a confident, wrong answer.

The attention ceiling doesn't just limit model quality. It creates a whole layer of engineering and human oversight that has to wrap around every serious AI deployment. The cost savings AI promises are partially eaten by the overhead of working around its limitations.

Some people point to newer techniques that let models "think longer" on hard problems as a way around this. But thinking longer doesn't help when the bottleneck is getting the right information in front of the model in the first place. If the answer is buried in document 5,000 and the model is only looking at documents 1 through 100, no amount of extra reasoning fixes that.

AI agents, systems that can search, browse, and pull information on demand, look like a better answer. Instead of cramming everything into one pass, let the model go find what it needs. But agents consume far more computing resources per task, which makes the cost problem worse. And there's a deeper issue: agents fail in ways that are hard to predict and hard to catch, because they're equally confident whether they're right or wrong. For routine work that's acceptable. For anything involving money, legal exposure, or compliance, that failure mode is a dealbreaker. You still need a human checking the work. Agents shift the bottleneck from doing the task to verifying the output. The human layer doesn't disappear. It changes shape.

There's also custom training: teaching a model your specific domain so it doesn't need everything fed through the input. It's the most promising workaround. It bakes knowledge into the model itself. New techniques have made this dramatically cheaper than it was even two years ago. But it still requires real expertise to get right, and it can't handle information that changes frequently. You can train a model on your company's historical contracts. You can't train it on the one that arrived this morning. For stable domain knowledge it works. For anything dynamic, you're back to the chunking problem.

So the math constrains the models, and the implementation constrains the deployments. But there's a third constraint sitting underneath both of them, and this one is man-made.


The artificial chokepoint

Beneath the mathematical ceiling sits an artificial one: Nvidia.

AI accelerator market share, 2025
Estimated data center AI chip revenue.[4][5]
~80% Nvidia
Nvidia~80%
AMD, Google TPU, others~20%
78% gross margins on data center segment[5]

Nvidia controls the chips, the software platform developers build on, and the ecosystem. Their consumer chips are deliberately memory-limited. Their enterprise chips charge 10–20× more, partly justified by different, faster memory technology but with margins far beyond the hardware cost difference. This is market segmentation backed by the fact that most AI software is written specifically for Nvidia's platform, and switching is expensive.

Export controls on advanced chips to China tightened supply at a time when demand was already outrunning production. Whether the controls themselves raised global prices is debatable, but they didn't help.

Competitors are working against this. AMD with GPUs, Google with TPUs, Cerebras and Groq with purpose-built AI chips, plus Chinese alternatives. But it takes time. The artificial constraint will resolve through competition over 5–10 years. But right now, Nvidia is the chokepoint on a strategic global resource.

We're racing through an artificial constraint toward a mathematical one. The world will celebrate when Nvidia's grip loosens. Then realize the next wall is harder.

And while we wait for that to play out, there's another distortion making the whole picture harder to read.


The subsidy problem nobody is pricing in

There's a third layer obscuring the picture. Inference, the cost of actually running a model every time someone asks it a question, is massively subsidized.

OpenAI, Anthropic, Google: all are losing money on inference to buy market share. The price developers and users pay today doesn't reflect real costs. This creates an illusion of infinite cheap AI that's shaping investment decisions, business models, and career choices at scale.

OpenAI: cost of running the AI vs. revenue (Jan–Sep 2025)
OpenAI spent nearly 2× its revenue just on computing costs to answer user queries.[6] Projected $14B loss in 2026.[7]
$8.67B
Computing cost
~$4.6B
Revenue
Anthropic is in a similar position, burning roughly 70 cents of every dollar it brings in.[8]

When subsidies normalize, real inference costs hit users. Companies whose entire product is an AI wrapper will feel this the most. But for businesses using AI as a tool within broader operations, inference is a small fraction of total cost. A law firm using AI to review contracts still saves enormously even if the API bill triples, because the AI is replacing work that cost far more. The subsidy correction will shake out the AI-native startups. It won't stop the deflationary pressure on everything else.

A mathematical ceiling. An implementation overhead. A hardware chokepoint. A pricing illusion. Four constraints shaping the same market. The hardware and pricing problems will resolve over the next few years as competition arrives and subsidies dry up. When they do, AI gets cheaper and more accessible, which makes the deflationary pressure stronger, not weaker. But the math ceiling and the implementation overhead stay. Those are what keep humans in the loop.

Part II
What good-enough AI does to prices

What the plateau actually looks like

So what does the next five years actually look like?

Current AI is already good enough to be deflationary. It doesn't need to get much smarter to change what things cost across most industries. Models will keep improving through better training, smarter architectures, and new tricks. But the hard ceiling on how much information a model can work with at once isn't going away. We're not getting a model that can read your entire company's history in one pass anytime soon. That's what the math constrains: not whether AI gets better at reasoning or writing, but whether it can handle the full complexity of a real business problem without humans breaking it into pieces first.

That limit is what keeps the human layer in place. AI gets cheaper, more reliable, better integrated. But the need for people who understand the business well enough to decide what the model sees, and to verify what it produces, doesn't go away. What matters economically isn't the next breakthrough. It's what the current capability does as it spreads into every industry.

Capability gains slow down. They may not stop, but the era of dramatic leaps every six months likely gives way to incremental improvement. Models get better through smarter training, better data, and clever tricks, but the step-change surprises come less often. Running these models gets cheaper through competition and better engineering but not free. Deployment widens into every serious business workflow. A shakeout hits companies whose valuations assumed exponential capability growth continuing indefinitely. A 2026 Fortune study of 6,000 CEOs across four countries found the vast majority report little measurable impact from AI on employment or productivity, a result economists are comparing to the Solow Paradox of the 1980s.[14] The honest caveat: the original Solow Paradox eventually resolved. Computers did boost productivity, just with a long lag. The same may happen with AI. But even in that optimistic scenario, the transition period looks like what's described here. Energy becomes a genuine geopolitical constraint. The gap between AI hype and AI reality becomes visible.

You'd see the ceiling in research output before the public notices: the race to handle longer inputs slowing down, performance on standardized tests flattening, and a flood of new approaches as everyone searches for a workaround. That's already happening.

Standard AI tests are maxing out
Best model scores vs. estimated human expert performance. Models still improve, but each gain takes more effort for less separation.[9][10]
Knowledge
test
92.3%
expert ceiling
PhD-level
reasoning
93.8%
expert ceiling
Long text
recall
50–65%
The standard knowledge test (MMLU) maxed out Sep 2024. The PhD-level reasoning test (GPQA) followed Nov 2025. Models haven't stopped improving, but each existing test stops being useful faster, and the gains that matter (reliability, reasoning on novel problems) aren't showing up as dramatic score jumps anymore.[11]

This isn't a collapse. It's a maturing. The next five years might be remembered not as when AI took over but when the hype met reality and the useful era actually started. Less dramatic than predicted. More powerful in aggregate than people will notice in the moment.

Which raises the question nobody in the AI conversation is asking clearly enough: if this wave is deflationary across nearly every category, what actually holds its value?

Part III
Where the value lands

What gets more expensive when everything deflates

Most investors haven't thought this part through yet.

The deflationary wave from AI and robotics is broader than people assume. Software commoditizes through AI. Hardware commoditizes through competition. Construction deflates through robots and prefab. Manufacturing continues to automate. Knowledge work margins compress. Content hits infinite supply. Even food production, with vertical farming, precision agriculture, and lab-grown protein, is on a deflationary trajectory.

Previous technology waves created new categories of value as fast as they deflated old ones. That may happen again. But the breadth of this wave is different: it touches production, services, knowledge work, and content simultaneously. Even if new industries emerge, the deflationary pressure on existing ones is real and wide.

Most traditional asset classes are stores of value in a world where production costs are relatively stable. That assumption is breaking across the board.

So what structurally resists deflation?

Scarce land. Not what's built on it or grown on it, but the land itself. Beachfront, water rights, strategic location. You can't print land.

Energy generation rights: owning the means to generate, not just consume. Grid infrastructure takes decades to build. Permitting and land use are genuinely scarce. Demand from AI, EVs, and industrial reshoring are all hitting simultaneously. Supply expansion is slow and demand is growing faster than anyone planned. This creates sustained price pressure that doesn't resolve quickly.

Political and regulatory control: permits, licenses, zoning. Artificial scarcity that technology doesn't dissolve easily because it's enforced by humans protecting their own interests.

Trusted brands and personal reputation. In a world where content is infinite, judgment is scarce. When anyone can generate professional-looking output, the question shifts from "who can produce this?" to "who do I trust?" A brand that took twenty years to build can't be replicated by a competitor with a better model. A personal reputation built on a track record of good calls doesn't commoditize. This is why luxury brands survive every deflationary wave and why word-of-mouth still beats advertising. 88% of consumers now rank brand trust equal to price and quality in purchase decisions,[12] and over a third say AI-generated content lowers their trust in the brand behind it.[13] Trust is scarce precisely because it can't be manufactured, only earned over time.

Most people think of marketing as the thing that gets cheaper when AI can write your ads and generate your content. And that layer does get cheaper. But the real job of marketing was never content production. It was building trust at scale. Brand, audience, reputation: these are the assets that compound over time and can't be replicated by a competitor with better prompts. Companies that treated marketing as a cost center are about to realize it was their most durable asset all along.

Human time and chosen presence. Not labor, which deflates, but irreproducible human connection. The difference between a concert and a recording. Between a meal with a friend and delivered food. When everything becomes cheap and abundant, the signal value of genuine human presence increases.


The asset class of the next decade

In a broadly deflationary environment, the question shifts from what appreciates to what holds value while everything else deflates.

Reducing costs becomes equivalent to making money. Energy independence, owned infrastructure, proprietary data: these aren't glamorous investments but they're stores of real value when financial assets deflate.

Proprietary data in specific domains is the clearest investment thesis. As models commoditize, the scarce input is data they weren't trained on: domain-specific, current, verified, local. The model becomes a commodity. The data doesn't.

The mathematical ceiling on attention is real. The deflationary wave it kicks off is real. The workarounds that sidestep it are all compromises. The energy constraint behind all of it doesn't resolve cleanly.

But the thread running through all of it is scarcity. Land is scarce. Energy is scarce. Regulatory access is scarce. And trust, the kind built slowly, verified through time, impossible to fake and impossible to automate, is the scarcest of all.

Marketing, real marketing, is how you build trust at scale. It's not the department that gets automated. It's the one that matters more when everything else is.

And the people who know how to wire AI into actual business operations, who can bridge the gap between what the models can do and what a company needs done, become more valuable the wider deployment gets. The tool is commoditizing. Knowing how to use it isn't.

The boring stuff wins when the exciting stuff deflates. And the most boring thing of all, showing up, being trustworthy, building a real reputation over time, turns out to be the only thing the next wave can't touch.

Full disclosure: I work in marketing and AI operations, which is both why I see this and why you should weigh my conclusions accordingly.

If you found this useful, I write about where technology and business actually intersect at juliolopez.me.

Sources

  1. Keles, F.D., Wiber, P.M., Cekic, M., & Keles, A.S. (2023). "On The Computational Complexity of Self-Attention." Proceedings of Machine Learning Research, Vol. 201. Proves conditional quadratic lower bounds on self-attention runtime.
  2. Epoch AI (2025). "LLMs now accept longer inputs, and the best models can use them more effectively." Documents ~1,000× context window growth from 2019–2025.
  3. Chroma Research (2025). "Context Rot: How Increasing Input Tokens Impacts LLM Performance." NVIDIA RULER benchmark shows effective context is 50–65% of advertised window size.
  4. Introl / industry estimates (2025). "NVIDIA's Unassailable Position." Estimates ~80% AI accelerator market share by revenue. (Note: Nvidia holds 94% of discrete consumer GPU shipments per Jon Peddie Research, but the AI data center segment is the relevant metric here.)
  5. Introl (2025). "NVIDIA's Unassailable Position." Documents 78% gross margins on data center hardware and CUDA ecosystem lock-in.
  6. Ed Zitron, "Where's Your Ed At" (2025). "Here's How Much OpenAI Spends On Inference and Its Revenue Share With Microsoft." Reports $8.67B inference spend in first 9 months of 2025.
  7. R&D World (2026). "Facing $14B losses in 2026, OpenAI is now seeking $100B in funding."
  8. AI2.Work (2025). "AI Inference Economics in 2025: Why OpenAI and Anthropic Are Still Losing Billions." Reports Anthropic burns 70% of revenue.
  9. Stanford HAI (2025). "Technical Performance — The 2025 AI Index Report."
  10. Artificial Analysis (2025). "GPQA Diamond Benchmark Leaderboard." Tracks model performance and saturation points.
  11. Phan, L. et al. (2026). "When AI Benchmarks Plateau: A Systematic Study of Benchmark Saturation." Documents MMLU saturation (Sep 2024) and GPQA saturation (Nov 2025).
  12. JMSR (2025). "Influence of AI-Generated Influencer Content on Brand Trust and Authenticity Perceptions." 88% of consumers rank brand trust equal to price and quality.
  13. StudyFinds (2025). "Viewers Think They See AI Everywhere, And It's Changing How They Trust Brands." 82.6% report encountering suspected AI content; >1/3 say it lowers brand trust.
  14. Fortune / NBER (2026). "Thousands of CEOs just admitted AI had no impact on employment or productivity." Survey of 6,000 executives across U.S., U.K., Germany, and Australia.