行业研究公司研究宏观策略财报招股书会议纪要 Token 低空经济十五五 AIGC 大模型

中国互联网：AI Token经济学与推理利润入门指南

信息技术 2026-04-20 - 伯恩斯坦 Angie

核心观点

AI 推理经济学的关键变量包括每 token 的收入和成本。收入方面，输入与输出 token 的相对长度、KV 缓存命中率以及多模态推理类型（如文本/编码与文本到语音或视频）都会显著影响 token 价格结构。成本方面，硬件堆栈和超大规模计算平台的加价，以及模型层批处理效率、GPU 利用率和每秒 token 吞吐量都会影响每 token 的成本。

关键数据

Z.ai 的 GLM-5 和 GLM-5.1 模型的推出和价格上调预计将有助于抵消 2026 年上半年的超大规模计算平台价格上涨。
Minimax 拒绝提高价格，这与 M2.5/2.7 模型作为低成本自主主干机的定位有关。
Z.ai 的 API 毛利率在 2025 年下半年为 22.4%，而 Minimax 在 2025 年前 9 个月的 Open Platform 毛利率为 69.4%。

研究结论

Z.ai 在文本/编码推理方面应比 Minimax 拥有更高的推理利润率，前提是没有更大的 token 吞吐量差异或非常高的数据中心级成本差异。
随着自主 AI 工作量的增加，Minimax 将面临工作负载组合变化带来的挑战，这可能会压缩其混合平均 Rev/Mtok。
Qwen 在计算成本方面享有相对优势，因为它在尽可能多的模态和广泛的计算需求方面处于领先地位。

China Internet: A primer on AI token economics and inferencemargins Not all AI inference margins are made equal.While AI continues to feature heavily inour discussions with investors, our experience has been that investor perceptions of AIinference economics remain relatively superficial. In this note we try to explain the variablesthat matter, using Minimax and Z.ai financials to illustrate important nuances. Robin Zhu+852 2123 2659robin.zhu@bernsteinsg.com Charles Gou+852 2123 2618charles.gou@bernsteinsg.com The variables that matter.At the simplest level, variable inference margins boil downto revenue and costs per token. On the revenue side, the relative lengths of input versusoutput tokens and KV cache hit rates contribute meaningfully to the P times Q equation.Different types of multimodal inference (e.g. text/coding versus text-to-speech or video)can have significant impacts on token price mix. Costs per token depend on the hardwarestack and hyperscaler mark-ups, but also batching efficiency at the model layer, GPUutilisation, and therefore token throughput per second. Min-Joo Kang+852 2123 2644minjoo.kang@bernsteinsg.com Measuring this stuff is hard, however…The revenue per token side is easier to quantify.Token pricing is publicly available, and “typical” input-output ratios can be ball-parked.OpenRouter data (while imperfect) gives some insights into KV cache rates. The cost side ismore opaque, but we can simplify the maths by triangulating cost per GPU-hour estimates- assuming the Chinese AI labs use hardware set-ups not wildly different from each other(e.g. hyperscaler compute using a mix of H20/H100 grade chips). Putting it all together, wethink it’s possible to draw some useful qualitative conclusions. The shockingly high pricing of text-to-speech inference.Comparisons of multi-modal inference offer the clearest demonstration of how not all tokens are made the same.Adjusted for “typical” input-output token and KV cache hit rates, we think Z.ai generatesblended revenue per million tokens (rev/Mtok) in the $1 range. Video generation yieldsa slightly higher rev/Mtok, assuming standard 4:3 aspect ratio output. Minimax’s lightermodel means text/coding tokens are priced in the $0.2/Mtok range. But we think its text-to-speech tokens imply list prices over $10/Mtok! The supernova moment for agentic AI.For H2 2025, Z.ai reported 22.4% API grossmargins, implying a 30.3% incremental margin year on year. Minimax reported 69.4% OpenPlatform gross margin in 9M 2025, but did not report a full year figure. Looking forward,Z.ai’s GLM-5 and GLM-5.1 launches and associated price increases should help to offsethyperscaler price hikes in H1 2026. Minimax has declined to raise prices, which we suspectrelates to M2.5/2.7’s positioning as light agentic backbone. More importantly… mix shifttowards text/coding (e.g. OpenClaw) workloads should imply lower blended rev/Mtok. Qwen’s house edge on compute costs.To date, the Qwen strategy appears to havecentred around leading as many modalities as possible, and capturing a broad range ofcompute demand. For now compute constraints mean Alicloud has chosen to prioritiseprice increases and monetisation growth (e.g. see Qwen 3.6 becoming closed source)…but the importance of hyperscaler margins as an input for cost/Mtok for the independent AIlabs should mean Qwen continues to enjoy a relative cost advantage. BERNSTEIN TICKER TABLE INVESTMENT IMPLICATIONS While the AI discussion remains most entirely focused on top-line traction at this point, AI inference margins - as a proxy forbusiness unit economics - represent an important driver of medium and long-term competitiveness in our view. The revenue pertoken side is driven by headline token pricing, but also factors like geographic and modality mix, input-output sequence lengthratios, KV cache rates (corresponding to different types of workloads), and commercial discounts. The cost side is more opaque,and driven by cost per GPU-hour estimates, and token throughput efficiency. The top AI labs continue to work on (and haverecently reported) meaningful gains in token cost efficiency, which will continue to be a factor. But the available facts support anumber of interesting directional conclusions in our view. Within our coverage, Qwen to date has pursued leadership across a wide range of modalities. The importance of hyperscalermargins as an input for cost/Mtok should mean Qwen enjoys a durable cost advantage in the medium term compared with theindependent AI labs… especially as Alicloud hikes prices on external customers. VALUATION COMPS TABLE DETAILS DEMYSTIFYING AI INFERENCE MARGINS AI continues to feature heavily in our discussions with investors. Within this, the unit economics of AI models - and more broadlyhow AI lab inference margins come about - has been a recurrent focus. This note is intended to be a low-jargon summary ofour views, using reported financials from Minimax and Z.ai (both not covered) to illust

点击免费查看完整报告