China Internet: A primer on AI token economics and inferencemargins Not all AI inference margins are made equal.While AI continues to feature heavily inour discussions with investors, our experience has been that investor perceptions of AIinference economics remain relatively superficial. In this note we try to explain the variablesthat matter, using Minimax and Z.ai financials to illustrate important nuances. Robin Zhu+852 2123 2659robin.zhu@bernsteinsg.com Charles Gou+852 2123 2618charles.gou@bernsteinsg.com The variables that matter.At the simplest level, variable inference margins boil downto revenue and costs per token. On the revenue side, the relative lengths of input versusoutput tokens and KV cache hit rates contribute meaningfully to the P times Q equation.Different types of multimodal inference (e.g. text/coding versus text-to-speech or video)can have significant impacts on token price mix. Costs per token depend on the hardwarestack and hyperscaler mark-ups, but also batching efficiency at the model layer, GPUutilisation, and therefore token throughput per second. Min-Joo Kang+852 2123 2644minjoo.kang@bernsteinsg.com Measuring this stuff is hard, however…The revenue per token side is easier to quantify.Token pricing is publicly available, and “typical” input-output ratios can be ball-parked.OpenRouter data (while imperfect) gives some insights into KV cache rates. The cost side ismore opaque, but we can simplify the maths by triangulating cost per GPU-hour estimates- assuming the Chinese AI labs use hardware set-ups not wildly different from each other(e.g. hyperscaler compute using a mix of H20/H100 grade chips). Putting it all together, wethink it’s possible to draw some useful qualitative conclusions. The shockingly high pricing of text-to-speech inference.Comparisons of multi-modal inference offer the clearest demonstration of how not all tokens are made the same.Adjusted for “typical” input-output token and KV cache hit rates, we think Z.ai generatesblended revenue per million tokens (rev/Mtok) in the $1 range. Video generation yieldsa slightly higher rev/Mtok, assuming standard 4:3 aspect ratio output. Minimax’s lightermodel means text/coding tokens are priced in the $0.2/Mtok range. But we think its text-to-speech tokens imply list prices over $10/Mtok! The supernova moment for agentic AI.For H2 2025, Z.ai reported 22.4% API grossmargins, implying a 30.3% incremental margin year on year. Minimax reported 69.4% OpenPlatform gross margin in 9M 2025, but did not report a full year figure. Looking forward,Z.ai’s GLM-5 and GLM-5.1 launches and associated price increases should help to offsethyperscaler price hikes in H1 2026. Minimax has declined to raise prices, which we suspectrelates to M2.5/2.7’s positioning as light agentic backbone. More importantly… mix shifttowards text/coding (e.g. OpenClaw) workloads should imply lower blended rev/Mtok. Qwen’s house edge on compute costs.To date, the Qwen strategy appears to havecentred around leading as many modalities as possible, and capturing a broad range ofcompute demand. For now compute constraints mean Alicloud has chosen to prioritiseprice increases and monetisation growth (e.g. see Qwen 3.6 becoming closed source)…but the importance of hyperscaler margins as an input for cost/Mtok for the independent AIlabs should mean Qwen continues to enjoy a relative cost advantage. BERNSTEIN TICKER TABLE INVESTMENT IMPLICATIONS While the AI discussion remains most entirely focused on top-line traction at this point, AI inference margins - as a proxy forbusiness unit economics - represent an important driver of medium and long-term competitiveness in our view. The revenue pertoken side is driven by headline token pricing, but also factors like geographic and modality mix, input-output sequence lengthratios, KV cache rates (corresponding to different types of workloads), and commercial discounts. The cost side is more opaque,and driven by cost per GPU-hour estimates, and token throughput efficiency. The top AI labs continue to work on (and haverecently reported) meaningful gains in token cost efficiency, which will continue to be a factor. But the available facts support anumber of interesting directional conclusions in our view. Within our coverage, Qwen to date has pursued leadership across a wide range of modalities. The importance of hyperscalermargins as an input for cost/Mtok should mean Qwen enjoys a durable cost advantage in the medium term compared with theindependent AI labs… especially as Alicloud hikes prices on external customers. VALUATION COMPS TABLE DETAILS DEMYSTIFYING AI INFERENCE MARGINS AI continues to feature heavily in our discussions with investors. Within this, the unit economics of AI models - and more broadlyhow AI lab inference margins come about - has been a recurrent focus. This note is intended to be a low-jargon summary ofour views, using reported financials from Minimax and Z.ai (both not covered) to illust