您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。 [伯恩斯坦]:中国互联网:AI推理利润率补充说明;DeepSeek-V4、hy3-preview思考 - 发现报告

中国互联网:AI推理利润率补充说明;DeepSeek-V4、hy3-preview思考

信息技术 2026-04-27 - 伯恩斯坦 艳阳天Cathy
报告封面

China Internet: AI inference margins addendum; DeepSeek-V4,hy3-preview thoughts Feedback on our AI token economics model.Our recent AI token economics primer(LINK) drove considerable discussion and debate with investors, as well as feedback fromsome of the top Chinese AI labs. While our first effort was mostly intended as a bare-bones estimate aimed at demonstrating how the math behind inference margins worked,the feedback has offered useful real-world insight on where a few hard-to-measureassumptions land in practice. Robin Zhu+852 2123 2659robin.zhu@bernsteinsg.com Charles Gou+852 2123 2618charles.gou@bernsteinsg.com Refreshing our tokens per second throughput assumptions.The biggest changein this note versus our initial H1 2026 estimates relate to higher tokens-per-second(TPS) assumptions across the board - reflecting direct company feedback on datacentrelevel throughput. Qualitatively, our updated assumptions imply a small inference margindelta between Z.ai and Minimax than previously estimated. In contrast with the 3x ratiowe’d originally assumed between M2.5/2.7 and Z.ai GLM-5/5.1 TPS, and a 4x ratio inactive parameters, reconciling guidance for 40% text/coding margins for M2.7 requires a4.5-5.0x ratio in relative datacentre level TPS ratio between M2.7 and GLM5/5.1. We’veadded third-party throughput data and sensitivity analysis in this note. Min-Joo Kang+852 2123 2644minjoo.kang@bernsteinsg.com Commercial discounts, and the conclusions that don’t change.We intentionallyignored commercial discounts in our previous modeling, as they’re not directly tied tohow AI models consume and generate tokens, and somewhat unknowable outside-in.But it’s clear that customer mix and discounting significantly impacts revenue per token.Compared with our previous effort, we think our more important takeaways survive - thatthe GLM family of models likely generates higher margins assuming similar discounting,and a relative TPS ratio inversely correlated to active parameters per token. Multi-modalinference generates much higher rev/Mtok and inference margins. Godot arrives - some early thoughts on DeepSeek V4.Our first glance take onDeepSeek-V4 boils down to “close to global SOTA from a few months ago, at significantlylower cost”. But - as has been foreshadowed for a while - the degree of separation betweenDeepSeek and domestic peers (e.g. Qwen, Z.ai, Kimi) has narrowed. V4-Flash priced atthe lower end of the peer range, which reinforces our bias that price competition amonglighter, low-cost models will remain fierce. V4-Pro pricing was initially adjacent to domesticpeers like Z.ai and Kimi, but a 75% “time limited” price cut immediately after launch feltnoteworthy too. Tencent starts climbing the curve.We think the best framing of hy3-preview is “asomewhat belated step in the right direction”. Our sense is the results were fine relative tomodest expectations - if not necessarily delivering the kind of “wow factor” Vinces Yao’spedigree had led some investors to anticipate. The real proof in the pudding will be whetherTencent’s rebuild of its AI data and infra organization can enable faster iteration of in-housemodel capabilities… and when the company might be able to deliver on its own “MuseSpark moment” in the coming months. Our bias remains that reprogramming the Internettop funnel in China will take longer. BERNSTEIN TICKER TABLE INVESTMENT IMPLICATIONS In this note we’ve updated our AI token economics framework to incorporate feedback from China’s AI labs. Qualitatively,the 40% gross margin assumption Minimax has guided implies a materially higher tokens-per-second throughput than wepreviously assumed. While this can be explained in a number of ways, centred around hardware and software optimization,our bias remains that Z.ai’s frontier models should have higher margins, assuming relative TPS ratios banded around activeparameters per token scale, typical workload mix, and comparable commercial discount levels. The shift from multi-modal totext/coding workloads Looking top down, the DeepSeek V4 release keeps China within months of the global SOTA, while the open source, openweights nature of its release should help to elevate capabilities across the leading Chinese AI labs. Tencent’s hy3-previewrelease took place amid very different levels of ex ante expectations, and should probably be considered through the lens ofwhether it signals faster model and application layer iteration in the coming months. VALUATION COMPS TABLE DETAILS ITERATING OUR AI TOKEN ECONOMICS AND INFERENCE MARGIN MODELING For a while now we’ve compared following frontier AI developments to the “blind men and the elephant” parable, while thespeed of day-to-day news flow has remained rapid. In this context, we’ve been grateful for the feedback from contacts atChina’s top AI labs on the token economics framework we recently published (LINK). In the main, the feedback has reinforcedour confidence in the mechanics of the framework, while