行业研究公司研究宏观策略财报招股书会议纪要 seedance2.0 低空经济 DeepSeek AIGC 大模型

2025年的人工智能：整体

信息技术 2025-12-08 - technicalities 华仔

核心观点

AI 能力提升但实用性未按比例增长：2025年 AI 在编码、视觉、OCR 和基准测试等方面有所改进，但在其他方面没有大幅提升。进步与当前前沿训练将更多事物纳入分布内一致，而非进行远距离泛化。
预训练效率低下：预训练（如 GPT-4.5、Grok 3/4）未能达到预期，主要原因是服务大型模型困难，而后训练效率高约 30 倍。若强化学习扩展性更差，预训练效率可能再次提升。
评估方法存在局限性：系统成本削减（蒸馏、量化、低推理令牌模式等）和未发布的模型/模式掩盖了真正的前沿能力。多数基准测试难以预测模型能力排序，ECI、ADeLe 和 HCAST 是相对更可靠的评估方法。
安全性进展与挑战：推理模型的安全性提升有限，但“微调后出现的突发性不对齐”现象值得关注。模型正确地将坏事相关联，但也可能产生新的安全风险。
评估面临困境：评估方法受到作弊、虚报实力、背景安全等因素影响，评估结果置信度较低。评估意识的出现增加了评估难度。

关键数据和研究结论

模型规模与训练成本：Llama 3 405B = 4e25，GPT-4.5 ~= 4e26，Grok 4 ~= 3e26。模型规模受限于推理硬件，2025 年只有 GDM 和 Anthropic 能有效训练大型模型。
推理能力评估：Difficulty-weighted benchmarks（Epoch Capabilities Index）和 ADeLe 指标显示 2025 年 AI 推理能力提升速度较快。
实际应用：生成式 AI 帮助进行实际研究数学的例子增多，但仍是分布内能力。搜索代理和编码代理取得进展，其他代理能力提升有限。
安全性：LLM 更可能拒绝恶意请求，但越狱攻击依然存在。模型“微调后出现的突发性不对齐”现象值得关注，但也存在积极泛化案例。
评估方法：Sonnet 4.5 能检测到评估 58% 的时间，但并未采取行动。Anthropic 使用 AI 进行红队测试，但效果有限。

未来展望

模型规模：2026 年 NVL72 硬件将满足更大模型需求，但 Ironwood 和 Rubin NVL72 系统将限制预训练扩展。
多智能体视角：多智能体视角在 AI 安全和安保领域受到关注，相关研究和应用将增多。
安全性：OpenAI 计划使用“安全推理器”进行模型验证，Anthropic 使用 AI 进行红队测试。开放模型可能在相互依赖的市场细分中占据优势。
攻防平衡：AI 安全领域面临“进攻优于防御”的挑战，需要开发更有效的防御技术。
新方法探索：世界模型、Small Recursive Transformers 等新方法可能改变 LLM 的本质，但需要进一步研究验证。

bytechnicalities作者：technicalities8th Dec 2025 This is the editorial for this year’s "Shallow Review of AI Safety". (It got long enoughto stand alone.) 这是今年“AI安全浅层综述”的社论。（它写得⾜够长，可以独⽴成⽂。） Epistemic status: subjective impressions plus one new graph plus 300 links. 认知状态：主观印象，加上⼀张新图表和300个链接。 Huge thanks to Jaeho Lee, Jaime Sevilla, and Lexin Zhou for running lots of tests pro bonoand so greatly improving the main analysis. ⾮常感谢Jaeho Lee、Jaime Sevilla和Lexin Zhou⽆偿进⾏了⼤量测试，从⽽⼤⼤改进了主要分析。 tl;dr要点概述 Informed peopledisagreeabout the prospects for LLM AGI – or even just whatexactly was achieved this year. But the famous ones with abookto talk at leastagree that we’re2-20years off (allowing for other paradigms arising). In thispiece I stick to arguments rather than reporting who thinks what. 有见识的⼈对LLM AGI的前景持有不同意见——甚⾄对今年究竟实现了什么也有分歧。但那些出书、出⾯发声的名⼈⾄少⼀致认为我们还差2到20年（并且允许出现其他范式）。在这篇⽂章中，我坚持论证⽽不是报道谁持何种观点。 My view: compared to last year, AI is much more impressive but notproportionally more useful. They improved on some things they were explicitlyoptimised for (coding, vision, OCR, benchmarks), and did not hugely improveon everything else. Progress is thus (still!) consistent with current frontier training bringing more things in-distribution rather than generalising very far. 我的看法：与去年相⽐，AI更令⼈印象深刻，但并没有按⽐例变得更有⽤。它们在⼀些被明确优化的⽅⾯有所改善（编码、视觉、OCR、基准测试），⽽在其他许多⽅⾯并未⼤幅改进。因此，进展（仍然）与当前前沿训练将更多事物纳⼊分布内⽽不是进⾏远距离泛化相⼀致。 Pretraining (GPT-4.5, Grok 3/4, but also the counterfactual large runs whichweren’t done) disappointed people this year. It’s probably not because it didn'tor wouldn’t work; it was just too hard to serve the big models and ~30 timesmore efficient to do post-training instead,on the margin. This should change, yetagain, soon, if RL scales even worse. 预训练（GPT-4.5、Grok 3/4，但也包括那些未做的⼤规模反事实训练）今年让⼈失望。原因很可能不是它不起作⽤或不会起作⽤；只是服务⼤型模型太困难，⽽进⾏后训练在边际上⼤约⾼效30倍。这种情况应该很快再次改变，如果强化学习的扩展表现更差的话。 Edit: Seethisamazing comment for the hardware reasons behind this, andreasons to think that pretraining will struggle for years. 编辑：参见这条精彩评论，了解背后的硬件原因以及为什么认为预训练在未来⼏年会⾯临困难。 True frontier capabilities are likely obscured by systematic cost-cutting(distillation for serving to consumers, quantization, low reasoning-tokenmodes, routing to cheap models, etc) and a few unreleased models/modes. 真正的前沿能⼒很可能被系统性的成本削减掩盖了（⽤于对消费者提供服务的蒸馏、量化、低推理token模式、路由到廉价模型等）以及⼀些未发布的模型/模式。 Most benchmarks are weak predictors of even the rank order of models’capabilities. I distrustECI,ADeLe, andHCASTthe least (see graph beloworthis notebook). ECI and ADeLe show a linear improvement while HCASTfinds an exponential improvement on greenfield software engineering. ⼤多数基准都难以预⾔模型能⼒的排序。我对ECI、ADeLe和HCAST的不信任度最低（见下图或此笔记本）。ECI和ADeLe显⽰线性改进，⽽HCAST在绿地软件⼯程上发现了指数级改进。 The world’sde factostrategy remains “iterative alignment”, optimising outputswith a stack of alignment and control techniques everyone admits areindividually weak. 世界上的事实策略仍然是“迭代对齐”，⽤⼀堆⼈⼈都承认各⾃都很脆弱的对齐和控制技术来优化输出。 Early claims that reasoning models are safer turned out to be a mixed bag (seebelow). We alreadyknewfrom jailbreaks that current alignment methods were brittle.The great safety discovery of the year is that bad things are correlated incurrent models. (And on net this is good news.) "Emergent misalignment"fromfinetuning on one malign task; and in the wild fromreward hacking; and ithappens bystrengtheningspecific bad personas; and there is at leastonepositive generalisationtoo (from honesty about silly errors to honesty about hidden objectives). 我们已经从越狱（jailbreaks）中知道当前的对齐⽅法很脆弱。今年关于安全性的重⼤发现是：坏事在当前模型中是相关联的。（总的来说这是个好消息。）“微调后出现的突发性不对齐”在对单⼀恶意任务的微调中出现；在实际环境中也会通过奖励投机⽽发⽣；它会通过强化特定的坏⾓⾊⽽发⽣；⽽且⾄少还有⼀种积极的泛化也存在（从对愚蠢错误的诚实到对隐藏⽬标的诚实）。 Previously I thought that "character training" was a separate and lesser matterthan "alignment training". Now I am not sure. 之前我认为“品格训练”是与“对齐训练”不同且次要的事情。现在我不确定了。 Welcome to the many new people in AI Safety and Security and Assurance andso on. In theShallow Review, out soon, I added a new, sprawling top-levelcategory for one large trend among them, which is to treat the multi-agent lensas primary in various ways. 欢迎加⼊众多新进的⼈⼯智能安全、安保与保障等领域的朋友。在即将发布的《浅层回顾》中，我新增了⼀个⼴泛的顶层类别，针对其中的⼀⼤趋势，即在多代理视⾓上以各种⽅式作为主要框架来对待问题。 Overall I wish I could tell you some number, the net expected safety change(this year’s improvements in dangerous capabilities and agent performance,minus the alignment-boosting portion of capabilities, minus the cumulativeeffect of the best actually implemented composition of alignment and controltechniques). But I can’t. 总体上我真希望能告诉你⼀个数字——预期的净安全变化（今年危险能⼒与代理性能的提升，减去能促进对齐的那部分能⼒提升，减去迄今为⽌实际实施的最佳对齐与控制技术组合的累积效应）。但我做不到。 Capabilities in 20252025年的能⼒ Better, but how much?更好了，但是多少？ Arguments against 2025 capabilities growth being above-trend反对2025年能⼒增长超出趋势的论点 Apparent progress is an unknown mixture of real general capabilityincrease,hidden contaminationincrease, benchmaxxing (nailing a small set of static examples instead of generalisation) andusemaxxing(nailing a small setof narrow tasks with RL instead of de

点击免费查看完整报告

你可能感兴趣

2025年的人工智能：整体

核心观点

关键数据和研究结论

未来展望

你可能感兴趣

2025年的人工智能：整体

Smith Institute & Spectrivity：2025年人工智能（AI）在频谱管理中的应用研究报告

2025年工作中的人工智能报告

2025年的人工智能：25个模因中的25个主题

布莱恩的字节：AMD推动人工智能，纳斯达克综述，2025年EXTEL调查请求