anu.sh

If I had asked people what they wanted, they would have said faster horses, Henry Ford is famously quoted. Yet, Ford didn’t invent cars to replace horses. He invented factories to make cars cheaper and more accessible. Factories that would make the car, according to Ford, ‘so low in price that no man making a good salary will be unable to own one.’

When Ford developed the Model T, cars had been around for decades, made ‘artisinally’, if you wish. They were expensive and unreliable. Ford’s goals were affordability and reliability. His first idea was to introduce consistently interchangeable auto-parts. His second idea was the moving assembly line, which reduced the time workers spent walking around the shop floor to procure and fit components into a car. ‘Progress through cautious, well-founded experiments,’ is a real quote from Ford. The 1908 Model T was 20th in the line of models that began with the Model A in 1903.

Borrowing a leaf from the top of the last century, more than inventing new foundation models (FMs), we need to invent the factories that make such models trivially affordable and reliable.

Billions of Models

If FMs are the new ‘central processing units’, technically there’s no limit to the number and variety of ‘programmable’ CPUs that we can now produce. A typical CPU chip requires $100-500M in research and design, $10-20B in building out fabrication capacity, and 3-5 years from concept to market. Today, FMs can be pretrained within months (if not weeks) for much less. On this path, more than OpenAI, Google and DeepSeek have accelerated affordability.

Model Provider	Small Model	Mid-Size Model	Reasoning Model
Anthropic	Haiku 3.5: $0.8/MIT, $4/MOT	Sonnet 3.7: $3/MIT, $15/MOT	N/A
OpenAI	GPT 4o-mini: $0.15/MIT, $0.60/MOT	GPT 4o: $2.5/MIT, $10/MOT	o3-mini: $1.1/MIT, $4.4/MOT
DeepSeek	N/A	V3: $0.27/MIT,$1.10/MOT	R1: $0.55/MIT, $2.19/MOT
Google	Gemini 2.0 Flash: $0.10/MIT, $0.40/MOT	Gemini 1.5 Pro: $1.25/MIT, $5/MOT	Gemini 2.0 Flash Thinking: Pricing N/A

Note: MIT: Million Input Tokes, MOT: Million Output Tokens

Affordability

Affordability is admitedly relative. Software developers may be willing to pay more than marketing content creators. The value of research for knowledge workers depends on the decisions it enables them to make. Customer support may be valuable, but no more than the total cost of employing and managing human support representatives. When the returns (or savings) are clear, there is quantifiable demand.

OpenAI loses more than twice that it makes, and it needs to cut costs by a roughly an order of magnitude to be sustainably profitable. If Nvidia’s Blackwell chips deliver the promised 40x price to performance improvement, this will be the year of non-absurd business models. More power to them.

It’s possible that DeepSeek is already there. More importantly, DeepSeek might represent the price level of an API provider who doesn’t have a application business to cannibalize. Is it ironic that OpenAI is facing an innovator’s dilemma of their own?

Meanwhile, Anthropic charges a premium over OpenAI’s application-level rates. They also need an order of magnitude reduction. They might already be there with TPUs, or with Trainium 2’s 75% price drop they’re likely getting there. It’s unclear if they have a cannibalization issue yet, though their CPO definitely wants their product teams to iterate faster.

Training and adapting the model to meet specific and evolving customer expectations is the business need. On this point, popular applications such as Perplexity and Cursor/Windsurf are arguably underrated. Just as Midjourney provides a delightful experience by getting the combination of the model and user experience just right, these applications taking their shot. After all, the model is a software component, and application developers want to shape it endlessly for their end users. The faster these developers iterate with their models based on feedback from their applications, the faster they’ll see product-market fit. They can then figure out how to grow more efficient. Finding product-market fit is the only path to affordability.

People mistake such applications to be ‘wrappers’ around the model or ‘just’ interface engineering. That’s a bit like saying Google is just interface engineering over Page Rank.

Reliability

For a given use case, reliability is a function of: How often does the model ‘break’? How easy and/or expensive is it to detect and fix?

In creative work, there’s often no wrong answer. And checking the result is generally easier than generating the result.

For automation, it’s more of a spectrum ranging from infeasible to non-compliant, to varying degrees of risky, to safe & expensive, to safe & cost-effective.

What makes the application risky vs. safe? And who underwrites the risk?

One answer is tool use. Multiple tool use protocols such as Model Context Protocol want to make FMs more aware of available tools in addition to making tool use more effective and efficient. However, there’s no significant reason (yet) for any major model provider to use another’s protocol. I expect protocols to emerge from most if not all model providers, and feel that standardization is at least a year or two away. Even then, new standards may usurp older ones, and different economic and geopolitical agendas could shape these in weird ways.

However, a sophisticated ‘tool’ or service really wants to be an agent. When multiple agents need to work together, we need distributed ownership, separation of concerns, authentication, authorization, auditability, interoperability, control, non-repudiation, and a lot more. Much of this plumbing already exists with OAuth2.0 and can be repurposed for service agents, but a lot still needs to be built. Whoever builds the most reliable multi-agent collaboration systems will likely grow to become the most trusted.

The Industrial Revolution

Unlike fantasy AI factories that spew pure intellgence as tokens, these factories will ship affordable and reliable engines that can safely power software applications. While we urgently kick off the next manufacturing industrial build out in the U.S., my guess is that these software factories will take years to build. We need to have started yesterday…