行业研究公司研究宏观策略财报招股书会议纪要海南封关低空经济 DeepSeek AIGC 大模型

大语言模型的测试时扩展：从子问题结构视角的综述

2025-11-01-未知机构落***

AI智能总结

核心观点

子问题结构对推理性能至关重要：论文强调子问题结构的识别和组织对各种问题领域（单模态 vs. 多模态）和大型语言模型（直接生成 vs. 检索增强）都具有重要意义。
推理路径分类：论文根据子任务的拓扑结构将推理路径分为顺序、并行和树形三种类型，并分别探讨了每种类型在不同应用场景下的优势和挑战。

关键技术

顺序结构：包括思维链（CoT）、组合思维链、自我完善、步骤回溯提示等，通过逐步推理引导模型找到最终解决方案。
并行结构：包括自我一致性、通用自我一致性、分支解决合并、多路径协作反射和反思代理等，通过探索多条解决方案路径来提高鲁棒性和准确性。
树形结构：包括思维树（ToT）、A*解码、MCTS等，通过动态选择子任务、回溯和剪枝来提高效率。

挑战与缓解策略

顺序结构：容易出现错误传播，可以通过自我验证、外部反馈、自我完善等策略来缓解。
并行结构：可能出现路径间不连贯或冲突，可以通过共识机制、轻量级模型预筛选等策略来缓解。
树形结构：存在空间成本高、节点评估可靠性低、重复模拟计算成本高等问题，可以通过剪枝、限制树的生长、结合自我评估和外部信号、轻量级模型近似等策略来缓解。

未来方向

元推理：利用元学习选择推理策略，根据输入多样性选择合适的分解粒度和推理结构。
高效多路径推理：在提高鲁棒性和准确性的同时，优化计算效率，例如使用轻量级模型预筛选或近似模拟结果。
多模态和检索增强系统中的树形推理：将树形推理扩展到多模态理解和检索增强系统中，以解锁更丰富的子任务结构和更可靠的推理能力。

With this paper, we survey techniques for im-proving the predictive accuracy of pretrainedLLMs by allocating additional compute at in-ference time.In categorizing test-time scal-ing methods, we place special emphasis onhow a problem is decomposed into subprob-lems and on the topological organization ofthese subproblems—whether sequential, par-allel, or tree-structured.This perspective al- In this paper, we survey test-time scaling tech-niques from the perspective of subproblem struc-tures. A key insight we provide is that the identi-fication and organization of the subproblems haveimportant implications shared across problem do- 1Introduction Test-time scaling (TTS) refers to the strategy oftrading more computational resources for more pre-dictive accuracy at inference time (Brown et al.,2024; OpenAI, 2024; Wu et al., 2024; DeepSeek-AI et al., 2025). By trading additional inference-time compute for accuracy, test-time scaling im-proves the performance of Large Language Mod-els (LLMs) and Vision-Language Models (VLMs) It is necessary to define the scope of this survey.We limit ourselves to methods that do not alter theparameters of the main LLM, but we do not rule outmethods that finetune a small, auxiliary LLM, suchas a learned heuristic function in tree search (Sec-tion 3.3). We exclusively focus on methods tradecompute for accuracy, not methods that accelerate The most classic example of test-time scalingis arguably the Chain-of-Thought (CoT) technique(Wei et al., 2022a), whichsequentiallygenerates anumber of intermediate textual tokens describingthe thinking process before predicting the final an-swer.These intermediate tokens can be shown Further, it is beneficial to articulate the differ-ences between TTS and a few related areas. TTScould be applied to improve LLM/VLM reasoningor agentic LLMs, but these two areas contain other Besides the sequential execution of CoT, it ispossible to organize the subtasks or subproblems However, the rigidity of human-only decompo-sition limits the ability to customize the subtaskhierarchy to specific inputs. This may be addressed like reinforcement learning (DeepSeek-AI et al., 2025) and agentic LLMs contain topics such associetal simulation (Park et al., 2023). Our main contributions are as follows. First, weprovide a detailed survey of TTS techniques fromthe perspective of subtasks structure, highlightingsubtask decomposition as a paramount considera-tion in the design of TTS techniques. Second, weprovide detailed discussion of the relative strengths 2.2LLM-assisted decomposition LLM-assisted decomposition allows the LLM todecompose some or all subtasks at inference time.This paradigm provides more flexibility and is suit-able for tasks that require diverse, question-specificdecomposition or have no obvious one-size-fits-all structures. A representative approach is Least- to-Most Prompting (Zhou et al., 2023), which de-composes a complex problem explicitly into sim-pler subproblems in a question-specific manner. Incontrast, Chain-of-Thought Prompting (Wei et al., 2Task Decomposition The primary focus of the current survey is the de-composition of the target task into smaller, moremanageable subtasks. Depending on the degreeof automation, we categorize task decomposition 2.1Human-only decomposition Though LLM decomposition enables highly flex-ible and expressive reasoning, it may also produceunstable or suboptimal decompositions.For in-stance, the LLM may skip necessary subtasks orintroduce irrelevant steps which may lead to in-correct solutions. A well-known fact is that chain- In this strategy, the human designers provide a suf-ficiently detailed list or hierarchy of subtasks thatdo not need further decomposition. It is well suitedto tasks with known processes and clear controlflows, which may be crystallized from years of ex-perience. One example is the software engineering A hybrid decomposition strategy is also popular.In this strategy, human designers provide a clearoutline for task decomposition, and the LLM sub-sequently refines portions of the high-level direc-tions into more detailed subtasks at inference time,harnessing strengths of both approaches.Manyvariations of RAG follow this strategy; at a highlevel, there are only two tasks, Retrieval followed The human-only approach has a few advantages.First, a deterministic structure eliminates the needfor the model to plan its steps, thereby improvinginference efficiency and reducing variance. Sec-ond, it allows explicit designs that check and cor-rect for known LLM weaknesses, to which someLLMs may be oblivious (Gandhi et al., 2025). Forexample, we may enforce a self-verification stepthat rechecks an answer from an LLM or screensfor unsafe outputs (Xie et al., 2023), or a stepthat removes irrelevant information that may mis-lead the LLM (Deng et al., 2023). Similarly, Self-Refine (Madaan et al., 2023) introduces a subtask 3Reasoning Paths In terms of the topological structure of subtasks,

点击免费查看完整报告