AI Inference Is Becoming a Utility

Rates of Change, Structural Constraints, and Where Value Goes

Published:

Philipp D. Dubach’s March 2026 analysis documents the cost collapse in AI inference precisely. This piece extends that data into the rate and direction of each underlying variable, and traces the structural endpoint.

This Pattern Has Run Before

This pattern has run before, repeatedly, with consistent structure. A new production method enters at the low-margin commodity base. Incumbents rationally retreat upmarket. The cost curve continues regardless. Incumbents are eventually stranded at the top with nowhere left to go. It has played out in every capital-intensive industry that hit a technological efficiency discontinuity.

United States

Steel is the cleanest analog — Nucor and the mini-mills versus U.S. Steel, 1960s through 1980s, documented in Dubach’s original analysis. The same sequence ran elsewhere:

Telecommunications. AT&T’s long-distance monopoly collapsed when MCI and Sprint used microwave and fiber to undercut per-minute rates. AT&T retreated to enterprise and international. Rates kept falling. Long-distance as a revenue category effectively ceased to exist within 20 years.

Semiconductors/DRAM. US producers dominated DRAM through the late 1970s. Japanese producers — Hitachi, NEC, Fujitsu — entered with higher capital intensity, better process discipline, and government backing, driving prices down 60–70% through the early 1980s. Intel exited DRAM entirely in 1985 and pivoted to microprocessors. The ones who didn’t pivot didn’t survive in that segment.

Airlines post-deregulation (1978). Deregulation removed pricing floors. Southwest entered at the low-margin short-haul segment with a single aircraft type and no hub complexity. Incumbents (Pan Am, Eastern, TWA) retreated to international and long-haul premium. Switching costs were zero — passengers bought on price. All three incumbents are gone. Southwest became the largest domestic carrier by passengers.

Disk drives. Clayton Christensen’s original case study for disruption theory. 14-inch → 8-inch → 5.25-inch → 3.5-inch. Each transition was led by a new entrant. Each incumbent was profitable in its segment until the segment disappeared. The cycle ran every 4–6 years from the 1970s through the 1990s.

Japan

The DRAM case is the sharpest. Japanese producers did to US semiconductor companies in the 1980s exactly what Chinese AI labs are now doing to US model providers: entered below the margin threshold, accepted lower returns on capital, used state-adjacent financing to sustain the price war, and took share until incumbents restructured or exited. The US government’s 1986 US-Japan Semiconductor Agreement imposed price floors — which primarily benefited Japanese producers by preventing the cost curve from fully expressing itself. Regulatory intervention extended the incumbents’ runway. It did not change the structural outcome.

Japan’s own electronics industry then ran the same sequence in reverse through the 1990s and 2000s, as Korean producers — Samsung, SK Hynix — did to Japanese DRAM makers exactly what Japan had done to Americans a decade earlier. Samsung used counter-cyclical investment explicitly, allocating over $100 million to DRAM development from 1983 to 1985 even as the global semiconductor market went into recession and Intel was exiting the business. Japan today has effectively no DRAM industry.

The consistent elements across all cases

New entrant with a structurally lower cost basis — not just better operations. Incumbent rational retreat upmarket. Switching costs at or near zero for the commodity segment. Cost curve continues regardless of incumbent strategy. Incumbents stranded. The timeframe varies — five years in disk drives, twenty years in steel — proportional to how fast the cost curve moves and how high switching costs are in the premium segments.

What the Rates Actually Show

Inference cost decline: exponential, arguably accelerating. Epoch AI’s 2025 research found a median 50x per year decline in cost for equivalent performance — a pace that dwarfs Moore’s Law. This is not a single-variable trend. Algorithmic gains — mixture-of-experts architectures, quantization, distillation, speculative decoding — are now compounding on top of hardware generational improvements rather than alternating with them. The 300x cost reduction from March 2023 to March 2026 ($30/M to $0.10/M for GPT-4-level performance) is a floor figure for the period, not a ceiling.

Performance gap closure: accelerating toward zero. The gap between open-source and proprietary models on the Chatbot Arena leaderboard shrank from 8% to 1.7% in a single year (Stanford HAI 2025 AI Index). GLM-5 holds the highest open-source Arena Elo at 1,452. Kimi K2.5 scores 99.0 on HumanEval at under $1/M. The closer open-source approaches parity, the more distillation and transfer from frontier models accelerates convergence. Sub-1% gap is reachable within 12 months at the current rate.

Switching costs: already at effective zero, no further movement possible. The OpenAI API format is the de facto standard. LiteLLM routes to 100+ providers through a single config change. Switching a model provider takes minutes of engineering time. This variable has reached its structural floor and stays there.

Energy availability: tightening, not easing. New grid capacity at scale is 5–10 years out minimum. Data center power demand is already hitting purchase agreement constraints in major markets. This is the constraint that determines hardware economics going forward, and it is moving in the opposite direction from every other variable.

The Energy Wall Reframes Infrastructure

The conventional framing — that GPU infrastructure providers collect durable rent regardless of which model runs on their hardware — requires revision.

When utilization becomes energy-constrained before it becomes demand-constrained, compute-per-watt becomes the decisive competitive variable. Purpose-built inference ASICs — Google’s TPUs, Amazon’s Trainium, Microsoft’s Maia — are designed for fixed transformer inference workloads at lower precision and without general-purpose overhead. The efficiency gap is measurable: Google’s Trillium TPU (v6) operates at approximately 300W TDP versus 700W for the H100 and 1000W for the B200, while Google claims 67% better energy efficiency over the prior TPU generation and 4x better price-performance versus H100 for qualifying inference workloads. Anthropic signed what Google described as the largest TPU deal in its history in late 2025, committing to hundreds of thousands of Trillium chips for inference.

Capex returns on GPU-heavy infrastructure therefore deteriorate faster than a demand-only model suggests, because the energy wall arrives first. Nvidia’s position in general-purpose training workloads faces a structural narrowing as inference-optimized silicon matures. The durable infrastructure position belongs to operators with the most energy-efficient silicon and the most favorable long-term power agreements.

The Reasoning Model Bet Does Not Resolve This

Compute-intensive reasoning models represent a bet that qualitative capability gains justify sustained premium pricing. The architecture does not support this.

Current systems are stochastic predictors. Error accumulates multiplicatively across reasoning steps: a system running at 95% per-step accuracy — optimistic for complex tasks — reaches approximately 60% accuracy by step 10 and 36% by step 20 (straightforward probability arithmetic: 0.95¹⁰ ≈ 0.60, 0.95²⁰ ≈ 0.36). Agentic and multi-step tasks — exactly where premium pricing is being justified — are where compounding failure is most acute. More compute per query does not change the underlying error accumulation; it patches over it temporarily. This is visible in the margin data: inference costs quadrupled to $8.4B in 2025 while gross margin fell from 40% to 33%. More compute spend produced margin compression, not margin expansion.

The Valuation Arithmetic

The $840B post-money valuation OpenAI closed in February 2026 requires $200B+ revenue by 2030 at a 43% CAGR, and gross margin expansion from 33% to 50%+, simultaneously.

The cost trajectory eliminates this combination. The volume required for 43% CAGR demands price competitiveness. Price competitiveness under a 50x/year cost decline means pricing toward open-source alternatives. Pricing toward open-source alternatives at current compute costs produces negative gross margins, not 50%+ expansion. Revenue growth will be prioritized — it is existential. Margins will compress further.

The circular financing structure compounds this: Amazon’s $50B investment flows back to AWS as compute spend; Nvidia’s $30B returns as GPU purchases. This inflates the optics of the round and raises questions about how much of the $20B ARR reflects genuinely independent third-party demand. Anthropic at $380B on $14B run-rate revenue faces structurally identical constraints at smaller scale.

The Structural Endpoint

The trajectory resolves to a utility profile: essential infrastructure, near-zero switching costs, capital-intensive, energy-intensive, competed margins. The S&P 500 Utilities sector currently trades at approximately 21x earnings, with a 20-year historical average of 16.5x margins. Damodaran’s January 2026 sector data shows Utility (General) at 19.92x trailing earnings and 18.13x forward; Power at 24.74x trailing — the high end reflecting current AI-driven electricity demand tailwinds. The gap between current model provider valuations (42x ARR for OpenAI, 27x for Anthropic) and utility-appropriate earnings multiples is structural, not cyclical. Note the comparison is revenue multiple to earnings multiple: since OpenAI has no earnings, the distance to utility-level valuation is larger than the numbers suggest, not smaller.

The entities with durable economics at the endpoint share specific characteristics: distribution at a scale where AI integration is near-zero marginal cost, ASIC efficiency advantages that survive the energy wall, or application-layer positions where AI is an input to a proprietary workflow rather than the product itself. In each case the AI model is infrastructure supporting an existing economic position, not the product being sold.

The model API as a product — an endpoint returning tokens, swappable in a config file — does not accumulate the switching costs, vertical integration, or ecosystem dependencies that have historically supported software-level margins. Every prior industry that hit this combination of zero switching costs and an accelerating cost curve resolved the same way. This iteration is the fastest ever documented at this capital scale.

This analysis is provided for informational and educational purposes only. It does not constitute investment advice, financial guidance, or a recommendation to buy, sell, or hold any security or asset. Financial projections, valuation comparisons, and cost trajectory estimates are analytical frameworks, not forecasts, and are subject to material uncertainty. References to specific companies and technologies are illustrative of structural dynamics and should not be interpreted as investment recommendations or assessments of any specific security. Past industry analogs do not guarantee equivalent outcomes. Readers should conduct their own analysis and consult qualified financial and legal professionals before making investment decisions. Nothing herein should be construed as creating a fiduciary relationship or advisory obligation of any kind.

Primary data: Epoch AI: LLM inference prices have fallen rapidly but unequally across tasks (2025); Stanford HAI 2025 AI Index; Artificial Analysis Intelligence Index (March 2026); OpenAI CFO Sarah Friar blog post (January 2026); The Information; Sacra financial model; provider API pricing pages (Anthropic, OpenAI, Alibaba Cloud, DeepSeek, Zhipu AI, Moonshot AI). TPU efficiency data: Google Cloud TPU documentation and MLQ.ai AI chips analysis. Utility sector valuations: Damodaran PE by Sector, January 2026. Intel DRAM exit: Intel company history. Samsung investment strategy: Harvard Business School Samsung Electronics case. Original exhibit analysis: Philipp D. Dubach, philippdubach.com, March 2026.

— Free to share, translate, use with attribution: D.T. Frankly (dtfrankly.com)