行业研究公司研究宏观策略财报招股书会议纪要 seedance2.0 低空经济 DeepSeek AIGC 大模型

AI现状：基于OpenRouter的100万亿Token实证研究

信息技术2025-12-04OpenRouter&a16z，

AI智能总结

核心观点：大型语言模型（LLM）的使用正在从单次文本生成转向多步骤推理和工具辅助的工作流程，形成了以“代理推理”为特征的新常态。同时，LLM 生态系统呈现出多元化趋势，开源模型和闭源模型并存，且地域分布日益全球化。

关键数据：

开源模型的使用量在过去一年中稳步增长，到 2025 年底约占使用量的 30%。
中国开源模型在全球市场中占据重要地位，其使用量平均占所有模型的 13%。
代理推理模型的使用量已超过 50%，其中推理模型、工具调用行为、更长的序列长度以及编程任务复杂性是主要驱动因素。
编程已成为 LLM 使用量增长最快的类别，约占使用量的 50%。
英语仍占据主导地位，但其他语言（如中文、俄语和西班牙语）的使用量也显著增长。

研究结论：

多模型生态系统：没有单一模型或供应商能够主导所有 LLM 使用场景，用户会根据能力、延迟、价格和信任等因素选择不同的模型。
使用场景多样化： LLM 不仅用于生产力任务，还广泛应用于角色扮演、娱乐和创意领域。
代理推理兴起： LLM 使用正从单次交互转向多步骤、工具集成和推理密集型的工作流程。
地域分布全球化： LLM 使用正变得越来越全球化，亚洲市场增长迅速，中国已成为重要的模型开发者和出口国。
成本与使用量动态： LLM 市场尚未完全商品化，用户会权衡成本与推理质量、可靠性和功能广度。
用户留存“灰姑娘玻璃鞋”效应：早期用户群体中，那些与模型特性高度匹配的工作负载会形成“灰姑娘玻璃鞋”效应，即具有持久粘性的用户群体。

未来展望： LLM 将继续成为推理类任务的重要计算基础，并朝着更加多元化、全球化和智能化的方向发展。

Malika Aubakirova∗†, Alex Atallah‡, Chris Clark‡, Justin Summerville‡, and Anjney Midha† ‡OpenRouter Inc.†a16z (Andreessen Horowitz) December, 2025 Abstract The past year has marked a turning point in the evolution and real-world use of large language models(LLMs). With the release of the first widely adopted reasoning model,o1, on December 5th, 2024, the fieldshifted from single-pass pattern generation to multi-step deliberation inference, accelerating deployment,experimentation, and new classes of applications. As this shift unfolded at a rapid pace, our empiricalunderstanding of how these models have actually been used in practice has lagged behind. In this work,we leverage the OpenRouter platform, which is an AI inference provider across a wide variety of LLMs,to analyze over 100 trillion tokens of real-world LLM interactions across tasks, geographies, and time.In our empirical study, we observe substantial adoption of open-weight models, the outsized popularityof creative roleplay (beyond just the productivity tasks many assume dominate) and coding assistancecategories, plus the rise of agentic inference. Furthermore, our retention analysis identifiesfoundationalcohorts: early users whose engagement persists far longer than later cohorts. We term this phenomenonthe Cinderella“Glass Slipper”effect. These findings underscore that the way developers and end-usersengage with LLMs “in the wild” is complex and multifaceted. We discuss implications for model builders,AI developers, and infrastructure providers, and outline how a data-driven understanding of usage caninform better design and deployment of LLM systems. 1Introduction Just a year ago, the landscape of large language models looked fundamentally different.Prior to late2024, state-of-the-art systems were dominated by single-pass, autoregressive predictors optimized to continuetext sequences. Several precursor efforts attempted to approximate reasoning through advanced instructionfollowing and tool use. For instance,Anthropic’s Sonnet 2.1 & 3models excelled at sophisticatedtool useand Retrieval-Augmented Generation (RAG), andCohere’s Command Rmodels incorporated structuredtool-planning tokens.Separately, open source projects like those done byReflection explored supervisedchain-of-thought and self-critique loops during training.Although these advanced techniques producedreasoning-like outputs and superior instruction following, the fundamental inference procedure remainedbased on a single forward pass, emitting a surface-level trace learned from data rather than performingiterative, internal computation.This paradigm evolved onDecember 5, 2024, when OpenAI released the first full version of itso1 reasoning model (codenamedStrawberry) [4].The preview released on September 12, 2024 had alreadyindicated a departure from conventional autoregressive inference.Unlike prior systems,o1 employed anexpanded inference-time computation process involving internal multi-step deliberation, latent planning, anditerative refinement before generating a final output. Empirically, this enabled systematic improvements inmathematical reasoning, logical consistency, and multi-step decision-making, reflecting a shift from patterncompletion to structured internal cognition. In retrospect, last year marked the field’s true inflection point:earlier approaches gestured toward reasoning, buto1 introduced the first generally-deployed architecturethat performed reasoning through deliberate multi-stage computation rather than merelydescribingit [6, 7]. While recent advances in LLM capabilities have been widely documented, systematic evidence abouthow these models are actually used in practice remains limited [3, 5]. Existing accounts tend to emphasizequalitative demonstrations or benchmark performance rather than large-scale behavioral data.To bridgethis gap, we undertake an empirical study of LLM usage, leveraging a 100 trillion token dataset fromOpenRouter, a multi-model AI inference platform that serves as a hub for diverse LLM queries.OpenRouter’s vantage point provides a unique window into fine-grained usage patterns.Because it orchestrates requests across a wide array of models (spanning both closed source APIs and open-weightdeployments), OpenRouter captures a representative cross-section of how developers and end-users actuallyinvoke language models for various tasks. By analyzing this rich dataset, we can observe which models arechosen for which tasks, how usage varies across geographic regions and over time, and how external factorslike pricing or new model launches influence behavior.In this paper, we draw inspiration from prior empirical studies of AI adoption, including Anthropic’s economic impact and usage analyses [1] and OpenAI’s reportHow People Use ChatGPT[2], aiming fora neutral, evidence-driven discussion.We first describe our dataset and methodology, including how wecategorize tasks and models. We then delve into a series of analyses that illuminate d

点击免费查看完整报告

你可能感兴趣

AI现状：基于OpenRouter的100万亿Token实证研究

你可能感兴趣

中国商业银行ESG信息披露中纳入性别相关指标的现状分析——基于32家A股上市商业银行的实证研究

数字贸易网络对绿色经济发展的影响研究基于一带一路”沿线国家的实证分析

基于深度学习的安全缺陷报告预测方法实证研究

基于CFPS数据的实证研究：父辈住房差异如何影响青年初职获得？

股指期货系列研究三：基于ETF及LOF组合的沪深300指数动态跟踪模拟实证研究