Ed Zitron has been relentlessly pursuing the questionable economics of AI and has tentatively identified a bombshell in his latest post, Exclusive: Here’s How Much OpenAI Spends On Inference and Its Revenue Share With Microsoft. If his finding is valid, large language models like ChapGPT are much further from ever becoming economically viable than even optimists imagine. No wonder OpenAI chief Sam Altman has been talking up a bailout.
By way of background, over a series of typically very long and relentlessly documented articles, Zitron has demonstrated (among many other things) the absolutely enormous capital expenditures of the major AI incumbents versus comparatively thin revenues, let alone profits. Zitron’s articles on the enormous cash burn and massive capital misallocation that AI represents have the work of Gary Marcus on fundamental performance shortcomings as de facto companion pieces. A sampling of Marcus’ badly needed sobriety:
5 recent, ominous signs for Generative AI
For a quick verification of how unsustainable OpenAI’s economics are, see the opening paragraph from Marcus’ November 4 article, OpenAI probably can’t make ends meet. That’s where you come in:
A few days ago, Sam Altman got seriously pissed off when Brad Gerstner had the temerity to ask how OpenAI was going to pay the $1.4 trillion in obligations he was taking on, given a mere $13 billion in revenue.
By way of reference, most estimates of the size of the subprime mortgage market centered on $1.3 trillion. And the AAA tranches of the bonds on mortgage pools of AAA bonds were money good in the end, although they did fall in value during the crisis when that was in doubt. And in foreclosures, the homes nearly always had some liquidation value.
Now to Zitron’s latest.
Many, particularly AI advocates in the business press, contend that even if the AI behemoths go bankrupt or are otherwise duds, they will still leave something of considerable value, as the building of the railroads (which spawned many bankruptcies) or the dot-com bubble did.
But those assumptions seem to be often based on a naive view of AI economics, that having made a huge expenditure on training, the ongoing costs of running queries is not high and will drop to bupkis. This was the case with railroads, which had high fixed costs and negligible variable costs. The network effects of Internet businesses produce similar results, with scale increases producing both considerable user benefits and lowering per-customer costs.
That is not the case with AI. Not only are there very large training costs, there are also “inference” costs. And they aren’t just considerable; they have vastly exceeded training cost. The viability of AI depends on inference costs dropping to a comparatively low level.
Zitron’s potentially devastating find is breadcrumbs that suggest that OpenAI’s inference costs are considerably higher than they pretend. Zitron further posits that the user prices for ChatGPT greatly subsidize the inference expenditures. Because the reporting on AI economics by all the big players is so abjectly awful, Zitron’s allegations may well pan out.
First, a detour to explain more about inference. From Primitiva Substacks’ All You Need to Know about Inference Cost from the end of 2024. Emphasis original:
Over the first 16 months after the launch of Gpt-3.5, the market’s attention was fixated on training costs, often making headlines for their staggering scale. However, following the wave of API price cuts in mid-2024, the spotlight has shifted to inference costs—revealing that while training is expensive, inference, even more.
According to Barclays, training the GPT-4 series required approximately $150 million in compute resources. Yet, by the end of 2024, GPT-4’s cumulative inference costs are projected to reach $2.3 billion—15x the cost of training.
As an aside, Gary Marcus pointed out in October that GPT-5 didn’t arrive in 2024 as had been predicted and has been disappointing. Back to Primitiva:
The September 2024 release of GPT-o1 further accelerated compute demand to shift from training towards inference. GPT-o1 generates 50% more tokens per prompt compared to GPT-4o and its enhanced reasoning capabilities result in the generation of inference tokens at 4x output tokens of GPT-4o.
Tokens, the smallest units of textual data processed by models, are central to inference compute. Typically, one word corresponds to about 1.4 tokens. Each token interacts with every parameter in a model, requiring two floating-point operations (FLOPs) per token-parameter pair. Inference compute can be summarized as:
Total FLOPs ≈ Number of Tokens × Model Parameters × 2 FLOPs.
Compounding this volume expansion, the price per token for GPT o1 is 6x that for GPT-4o’s, resulting in a 30-fold increase in total API costs to perform the same task with the new model. Research from Arizona State University shows that, in practical applications, this cost can soar to as much as 70x. Understandably, GPT-o1 has been available only to paid subscribers, with usage capped at 50 prompts per week….
The cost surge of GPT-o1 highlights the trade-off between compute costs and model capabilities, as theorized by the Bermuda Triangle of GenAI: everything else equal, it is impossible to make simultaneous improvements on inference costs, model performance, and latency; improvement in one will necessarily come at sacrifice of another.
However, advancements in models, systems, and hardware can expand this “triangle,” enabling applications to lower costs, enhance capabilities, or reduce latency. Consequently, the pace of these cost reductions will ultimately dictate the speed of value creation in GenAI….
James Watt’s steam engine was such an example. It was invented in 1776, but took 30 years of innovations, such as the double-acting design and centrifugal governor, to raise thermal efficiency from 2% to 10%—making steam engines a viable power source for factories…
For GenAI, inference costs are the equivalent barrier. Unlike pre-generative AI software products that were regarded as a superior business model than “traditional businesses” largely because of its near-zero marginal cost, GenAI applications need to pay for GPUs for real-time compute.
Zitron is suitably cautious about his findings; perhaps some heated denials from OpenAI will clear matters up. Do read the entire post; I have excised many key details as well as some qualifiers to highlight the central concern. From Zitron:
Based on documents viewed by this publication, I am able to report OpenAI’s inference spend on Microsoft Azure, in addition to its payments to Microsoft as part of its 20% revenue share agreement, which was reported in October 2024 by The Information. In simpler terms, Microsoft receives 20% of OpenAI’s revenue….
These numbers in this post differ to those that have been reported publicly. For example, previous reports had said that OpenAI had spent $2.5 billion on “cost of revenue” – which I believe are OpenAI’s inference costs – in the first half of CY2025.
According to the documents viewed by this newsletter, OpenAI spent $5.02 billion on inference alone with Microsoft Azure in the first half of Calendar Year CY2025.
As a reminder: inference is the process through which a model creates an output.
This is a pattern that has continued through the end of September. By that point in CY2025 — three months later — OpenAI had spent $8.67 billion on inference.
OpenAI’s inference costs have risen consistently over the last 18 months, too. For example, OpenAI spent $3.76 billion on inference in CY2024, meaning that OpenAI has already doubled its inference costs in CY2025 through September.
Based on its reported revenues of $3.7 billion in CY2024 and $4.3 billion in revenue for the first half of CY2025, it seems that OpenAI’s inference costs easily eclipsed its revenues.
Yet, as mentioned previously, I am also able to shed light on OpenAI’s revenues, as these documents also reveal the amounts that Microsoft takes as part of its 20% revenue share with OpenAI.
Concerningly, extrapolating OpenAI’s revenues from this revenue share does not produce numbers that match those previously reported.
According to the documents, Microsoft received $493.8 million in revenue share payments in CY2024 from OpenAI — implying revenues for CY2024 of at least $2.469 billion, or around $1.23 billion less than the $3.7 billion that has been previously reported.
Similarly, for the first half of CY2025, Microsoft received $454.7 million as part of its revenue share agreement, implying OpenAI’s revenues for that six-month period were at least $2.273 billion, or around $2 billion less than the $4.3 billion previously reported. Through September, Microsoft’s revenue share payments totalled $865.9 million, implying OpenAI’s revenues are at least $4.329 billion.
According to Sam Altman, OpenAI’s revenue is “well more” than $13 billion. I am not sure how to reconcile that statement with the documents I have viewed….
Due to the sensitivity and significance of this information, I am taking a far more blunt approach with this piece.
Based on the information in this piece, OpenAI’s costs and revenues are potentially dramatically different to what we believed. The Information reported in October 2024 that OpenAI’s revenue could be $4 billion, and inference costs $2 billion based on documents “which include financial statements and forecasts,” and specifically added the following:
OpenAI appears to be burning far less cash than previously thought. The company burned through about $340 million in the first half of this year, leaving it with $1 billion in cash on the balance sheet before the fundraising effort. But the cash burn could accelerate sharply in the next couple of years, the documents suggest.
I do not know how to reconcile this with what I am reporting today. In the first half of CY2024, based on the information in the documents, OpenAI’s inference costs were $1.295 billion, and its revenues at least $934 million.
Indeed, it is tough to reconcile what I am reporting with much of what has been reported about OpenAI’s costs and revenues.
So this is quite a gauntlet to have thrown down. Not only is he saying that OpenAI may still have business-potential-wrecking compute costs., but his evidence indicates that OpenAI has also been making serious misrepresentations about costs and revenues. Because OpenAI is not public, OpenAI has not necessarily engaged in fraud; one presumes it have accurate with those to whom it has financial reporting obligations about money matters. But if Zitron has this right, OpenAI has been telling howlers to other important stakeholders.
The Financial Times, with whom Zitron reviewed his data before publishing, is amplifying them. From How high are OpenAI’s compute costs? Possibly a lot higher than we thought:
Pre-publication, Ed was kind enough to discuss with us the information he has seen. Here are the inference costs as a chart:
The article then correctly offers caveats, as did Zitron long form, along with kinda-sorta comments from Microsoft and OpenAI:
The best place to begin is by saying what the numbers don’t show. The above is understood to be for inference only…
More importantly, is the data correct? We showed Microsoft and OpenAI versions of the figures presented above, rounded to a multiple, and asked if they recognised them to be broadly accurate. We also put the data to people familiar with the companies and asked for any guidance they could offer.
A Microsoft spokeswoman told us: “We won’t get into specifics, but I can say the numbers aren’t quite right.” Asked what exactly that meant, the spokeswoman said Microsoft would not comment and did not respond to our subsequent requests. An OpenAI spokesman did not respond to our emails other than to say we should ask Microsoft.
A person familiar with OpenAI said the figures we had shown them did not give a complete picture, but declined to say more. In short, though we’ve been unable to verify the data’s accuracy, we’ve been given no reason to doubt it substantially either. Make of that what you will.
Taking everything at face value, the figures appear to show a disconnect between what’s been reported about OpenAI’s finances and the running costs that are going through Microsoft’s books…
As Ed writes, OpenAI appears to have spent more than $12.4bn at Azure on inference compute alone in the last seven calendar quarters. Its implied revenue for the period was a minimum of $6.8bn. Even allowing for some fudging between annualised run rates and period-end totals, the apparent gap between revenues and running costs is a lot more than has been reported previously. And, like Ed, we’re struggling to explain how the numbers can be so far apart.
If the data is accurate — which we can’t guarantee, to reiterate, but we’re writing this post after giving both companies every opportunity to tell us that it isn’t — then it would call into question the business model of OpenAI and nearly every other general-purpose LLM vendor. At some point, going by the figures, either running costs have to collapse or customer charges have to rise dramatically. There’s no hint of either trend taking hold yet.
A quick search on Twitter finds no one yet attempting to lay a glove on Zitron. In the pink paper comments section, a few contend that Microsoft making weak protests about the data means it can’t be relied upon. While that is narrowly correct, one would expect a more robust debunking given the implications. And some of the supportive comments add value, like:
Bildermann
It explains why ChatGPT has become so dumb. They are trying to reduce inference costs.His name is Robert Paulson
The fact we have to use a gypsy with a magic 8 ball to figure out these numbers for the company that is “going to revolutionize every industry” is more telling then the numbers themselvesNo F1 key
Zitron has definitely been hitting that haterade, but Microsoft press saying the numbers ‘aren’t quite right’ makes me think this is pretty accurate.
manticore
That creaking noise is the lid being prized off the can of worms –MS had better get on top of this. That income stream is highly unlikely – becoz straight line etc etc – which means that their projections are going to be badly affected and presumably there has to be a K split in the projection line at some point. MS getting holed below the waterline has real-world impacts.
Multipass
I’ve been reading Ed’s blog for a while now and while he is clearly biased in one direction, it comes across as infinitely more credible than anything Sam Altman has said in years.The real issue in my eyes is that the revenue numbers are so opaque and obfuscated that nobody has any idea if any of this will make money.
The fact that Microsoft and Google seem to be intentionally muddying the waters when it comes to non-hosting-related LLM-driven revenues and that OpenAI and Anthropic have been disclosing basically nothing should come across as a major red flag, and yet nobody seems to care.
Angry Analyst still
Spoiler alert: technology maturity will not help.They will train and train and train ever larger models (parameter count in the trillions), feeding it all the data they can get or fabricate, using more powerful supercomputers than those running the physics simulations of the US nuclear arsenal. They will manually hack (which is why they need thousands of developers) additional logic around the model, fine tuned for more and more scenarios.
But it will all just papering over the inescapable fact that a generative pre-trained transformer model is intelligence as much as CGI is reality: that is exactly zero, it’s all a crude, approximate imitation devoid of the underlying nature of the thing. GPTs, for example, cannot solve logical problems because GPT models lack the facilities to have a conceptual representation of a problem, or in themselves to hold onto any ‘idea’. That’s also why whenever you try to use a GPT to carefully fine tune a response, it mostly cannot, it will just regenerate everything even if explicitly instructed not to do so.
The important question is: does it matter?
It could very well be that the imitation game will reach a point (with all that manual hacking and testing thousands of trajectories to select and condense the most likely response during inference) where it will be able to create and maintain the illusion of intelligence, even sentience, that hundreds of millions will end up just using it anyway, regardless of accuracy or substance. There are early warning of that already.
It also stands to reason that most tech bros know this, but go along with the game because 1) it is all about relevance and engagement, there is lots of money to be made even from just imitation, and 2) most likely they believe they need to take part in this phase of AI development to be in position for the next one.In any case, there is no path for GPT towards intelligence, it is not a scaling or maturity issue.
Let us see if and when some shoes drop after this report. The bare minimum ought to be sharper questions at analyst calls.



