行业研究公司研究宏观策略财报招股书会议纪要 Token 低空经济十五五 AIGC 大模型

中国互联网：AI推理利润率补充说明；DeepSeek-V4、hy3-preview思考

信息技术 2026-04-27 - 伯恩斯坦艳阳天Cathy

核心观点

AI token 经济模型更新：研报根据中国 AI 实验室的反馈更新了 AI token 经济模型，特别是针对推理边界的假设。主要变化是提高了每秒 token 通过量（TPS）的假设，这导致 Z.ai 和 Minimax 之间的推理边界差异比之前估计的要小。
Minimax 指导的 40% 推理边界：Minimax 指导的 M2.7 模型 40% 的文本/编码推理边界需要相对数据中心级 TPS 比率在 4.5-5.0 倍之间才能实现。研报通过第三方 TPS 数据和敏感性分析来支持这一观点。
商业折扣的影响：虽然研报在之前的模型中故意忽略了商业折扣，但反馈表明客户组合和折扣对每 token 收入有显著影响。研报在模型中加入了商业折扣变量，并假设了一个 20% 的平均折扣作为基准案例。
GLM 模型的高边界：研报仍然认为，在相似的折扣水平下，GLM 系列模型可能产生更高的边界。此外，相对 TPS 比率与每 token 活跃参数成反比。
多模态推理的优势：Minimax 确认了多模态工作负载比文本/编码工作负载产生更高的边界，这与研报的观点一致，即视频生成和文本到语音生成更高的 rev/Mtok。

关键数据

GPU 成本：研报估计 GPU 每小时的成本为 1.6 美元，并认为中国领先的 AI 实验室由于能够获得类似的 GPU 计算供应商，因此成本可能相似。
TPS 比较：Lambda Labs 的基准测试显示，M2.5 的 TPS 为 8,062 tok/s，而 GLM-5 的 TPS 为 6,300 tok/s。这表明 M2.5 的 TPS 大约是 GLM-5 的 1.28 倍。
商业折扣：研报假设了一个 20% 的平均折扣，并进行了敏感性分析，以了解不同折扣水平对边界的影响。

研究结论

Z.ai 的边界优势：在相似的 TPS 比率和活跃参数比例下，Z.ai 的前沿模型应该具有更高的边界。
多模态推理的潜力：多模态推理工作负载比文本/编码工作负载产生更高的边界和 rev/Mtok。
DeepSeek V4 的竞争态势：DeepSeek V4 的发布使中国与全球 SOTA 的差距缩小到几个月以内。V4-Flash 的定价处于较低水平，而 V4-Pro 的定价与国内同行相当，但随后进行了大幅折扣。
Tencent 的 AI 发展：Tencent 的 hy3-preview 发布被描述为向正确方向迈出的迟到一步。研报认为，Tencent 重建 AI 数据和基础设施组织的能力将决定其未来模型开发的速度。

其他观察

DeepSeek V4 没有声称在国产芯片上进行训练。
DeepSeek V4-Pro 比 V3.2 使用更少的单 token 推理 FLOPs 和 KV 缓存。
Tencent 的 agentic 功能开发仍然缺乏新消息。
Tencent 在广告业务中使用 AI 的能力获得了积极反馈。

China Internet: AI inference margins addendum; DeepSeek-V4,hy3-preview thoughts Feedback on our AI token economics model.Our recent AI token economics primer(LINK) drove considerable discussion and debate with investors, as well as feedback fromsome of the top Chinese AI labs. While our first effort was mostly intended as a bare-bones estimate aimed at demonstrating how the math behind inference margins worked,the feedback has offered useful real-world insight on where a few hard-to-measureassumptions land in practice. Robin Zhu+852 2123 2659robin.zhu@bernsteinsg.com Charles Gou+852 2123 2618charles.gou@bernsteinsg.com Refreshing our tokens per second throughput assumptions.The biggest changein this note versus our initial H1 2026 estimates relate to higher tokens-per-second(TPS) assumptions across the board - reflecting direct company feedback on datacentrelevel throughput. Qualitatively, our updated assumptions imply a small inference margindelta between Z.ai and Minimax than previously estimated. In contrast with the 3x ratiowe’d originally assumed between M2.5/2.7 and Z.ai GLM-5/5.1 TPS, and a 4x ratio inactive parameters, reconciling guidance for 40% text/coding margins for M2.7 requires a4.5-5.0x ratio in relative datacentre level TPS ratio between M2.7 and GLM5/5.1. We’veadded third-party throughput data and sensitivity analysis in this note. Min-Joo Kang+852 2123 2644minjoo.kang@bernsteinsg.com Commercial discounts, and the conclusions that don’t change.We intentionallyignored commercial discounts in our previous modeling, as they’re not directly tied tohow AI models consume and generate tokens, and somewhat unknowable outside-in.But it’s clear that customer mix and discounting significantly impacts revenue per token.Compared with our previous effort, we think our more important takeaways survive - thatthe GLM family of models likely generates higher margins assuming similar discounting,and a relative TPS ratio inversely correlated to active parameters per token. Multi-modalinference generates much higher rev/Mtok and inference margins. Godot arrives - some early thoughts on DeepSeek V4.Our first glance take onDeepSeek-V4 boils down to “close to global SOTA from a few months ago, at significantlylower cost”. But - as has been foreshadowed for a while - the degree of separation betweenDeepSeek and domestic peers (e.g. Qwen, Z.ai, Kimi) has narrowed. V4-Flash priced atthe lower end of the peer range, which reinforces our bias that price competition amonglighter, low-cost models will remain fierce. V4-Pro pricing was initially adjacent to domesticpeers like Z.ai and Kimi, but a 75% “time limited” price cut immediately after launch feltnoteworthy too. Tencent starts climbing the curve.We think the best framing of hy3-preview is “asomewhat belated step in the right direction”. Our sense is the results were fine relative tomodest expectations - if not necessarily delivering the kind of “wow factor” Vinces Yao’spedigree had led some investors to anticipate. The real proof in the pudding will be whetherTencent’s rebuild of its AI data and infra organization can enable faster iteration of in-housemodel capabilities… and when the company might be able to deliver on its own “MuseSpark moment” in the coming months. Our bias remains that reprogramming the Internettop funnel in China will take longer. BERNSTEIN TICKER TABLE INVESTMENT IMPLICATIONS In this note we’ve updated our AI token economics framework to incorporate feedback from China’s AI labs. Qualitatively,the 40% gross margin assumption Minimax has guided implies a materially higher tokens-per-second throughput than wepreviously assumed. While this can be explained in a number of ways, centred around hardware and software optimization,our bias remains that Z.ai’s frontier models should have higher margins, assuming relative TPS ratios banded around activeparameters per token scale, typical workload mix, and comparable commercial discount levels. The shift from multi-modal totext/coding workloads Looking top down, the DeepSeek V4 release keeps China within months of the global SOTA, while the open source, openweights nature of its release should help to elevate capabilities across the leading Chinese AI labs. Tencent’s hy3-previewrelease took place amid very different levels of ex ante expectations, and should probably be considered through the lens ofwhether it signals faster model and application layer iteration in the coming months. VALUATION COMPS TABLE DETAILS ITERATING OUR AI TOKEN ECONOMICS AND INFERENCE MARGIN MODELING For a while now we’ve compared following frontier AI developments to the “blind men and the elephant” parable, while thespeed of day-to-day news flow has remained rapid. In this context, we’ve been grateful for the feedback from contacts atChina’s top AI labs on the token economics framework we recently published (LINK). In the main, the feedback has reinforcedour confidence in the mechanics of the framework, while

点击免费查看完整报告