您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。[未知机构]:大语言模型的测试时扩展:从子问题结构视角的综述 - 发现报告

大语言模型的测试时扩展:从子问题结构视角的综述

2025-11-01-未知机构落***
AI智能总结
查看更多
大语言模型的测试时扩展:从子问题结构视角的综述

With this paper, we survey techniques for im-proving the predictive accuracy of pretrainedLLMs by allocating additional compute at in-ference time.In categorizing test-time scal-ing methods, we place special emphasis onhow a problem is decomposed into subprob-lems and on the topological organization ofthese subproblems—whether sequential, par-allel, or tree-structured.This perspective al- In this paper, we survey test-time scaling tech-niques from the perspective of subproblem struc-tures. A key insight we provide is that the identi-fication and organization of the subproblems haveimportant implications shared across problem do- 1Introduction Test-time scaling (TTS) refers to the strategy oftrading more computational resources for more pre-dictive accuracy at inference time (Brown et al.,2024; OpenAI, 2024; Wu et al., 2024; DeepSeek-AI et al., 2025). By trading additional inference-time compute for accuracy, test-time scaling im-proves the performance of Large Language Mod-els (LLMs) and Vision-Language Models (VLMs) It is necessary to define the scope of this survey.We limit ourselves to methods that do not alter theparameters of the main LLM, but we do not rule outmethods that finetune a small, auxiliary LLM, suchas a learned heuristic function in tree search (Sec-tion 3.3). We exclusively focus on methods tradecompute for accuracy, not methods that accelerate The most classic example of test-time scalingis arguably the Chain-of-Thought (CoT) technique(Wei et al., 2022a), whichsequentiallygenerates anumber of intermediate textual tokens describingthe thinking process before predicting the final an-swer.These intermediate tokens can be shown Further, it is beneficial to articulate the differ-ences between TTS and a few related areas. TTScould be applied to improve LLM/VLM reasoningor agentic LLMs, but these two areas contain other Besides the sequential execution of CoT, it ispossible to organize the subtasks or subproblems However, the rigidity of human-only decompo-sition limits the ability to customize the subtaskhierarchy to specific inputs. This may be addressed like reinforcement learning (DeepSeek-AI et al., 2025) and agentic LLMs contain topics such associetal simulation (Park et al., 2023). Our main contributions are as follows. First, weprovide a detailed survey of TTS techniques fromthe perspective of subtasks structure, highlightingsubtask decomposition as a paramount considera-tion in the design of TTS techniques. Second, weprovide detailed discussion of the relative strengths 2.2LLM-assisted decomposition LLM-assisted decomposition allows the LLM todecompose some or all subtasks at inference time.This paradigm provides more flexibility and is suit-able for tasks that require diverse, question-specificdecomposition or have no obvious one-size-fits-all structures. A representative approach is Least- to-Most Prompting (Zhou et al., 2023), which de-composes a complex problem explicitly into sim-pler subproblems in a question-specific manner. Incontrast, Chain-of-Thought Prompting (Wei et al., 2Task Decomposition The primary focus of the current survey is the de-composition of the target task into smaller, moremanageable subtasks. Depending on the degreeof automation, we categorize task decomposition 2.1Human-only decomposition Though LLM decomposition enables highly flex-ible and expressive reasoning, it may also produceunstable or suboptimal decompositions.For in-stance, the LLM may skip necessary subtasks orintroduce irrelevant steps which may lead to in-correct solutions. A well-known fact is that chain- In this strategy, the human designers provide a suf-ficiently detailed list or hierarchy of subtasks thatdo not need further decomposition. It is well suitedto tasks with known processes and clear controlflows, which may be crystallized from years of ex-perience. One example is the software engineering A hybrid decomposition strategy is also popular.In this strategy, human designers provide a clearoutline for task decomposition, and the LLM sub-sequently refines portions of the high-level direc-tions into more detailed subtasks at inference time,harnessing strengths of both approaches.Manyvariations of RAG follow this strategy; at a highlevel, there are only two tasks, Retrieval followed The human-only approach has a few advantages.First, a deterministic structure eliminates the needfor the model to plan its steps, thereby improvinginference efficiency and reducing variance. Sec-ond, it allows explicit designs that check and cor-rect for known LLM weaknesses, to which someLLMs may be oblivious (Gandhi et al., 2025). Forexample, we may enforce a self-verification stepthat rechecks an answer from an LLM or screensfor unsafe outputs (Xie et al., 2023), or a stepthat removes irrelevant information that may mis-lead the LLM (Deng et al., 2023). Similarly, Self-Refine (Madaan et al., 2023) introduces a subtask 3Reasoning Paths In terms of the topological structure of subtasks,