行业研究公司研究宏观策略财报招股书会议纪要 Token 低空经济十五五 AIGC 大模型

大型语言模型的能力与局限性调查

2025-01-03 Expedia Group&罗马第三大学灰灰

大型语言模型研报总结

大型语言模型（LLMs）近年来取得了显著的进展，其性能和功能随着模型规模的扩大而不断提升。研报中探讨了LLMs的 scaling law，指出模型性能与参数数量之间存在正相关关系，并介绍了BERT、T5、GPT系列等代表性模型架构。此外，研报还讨论了LLMs在医疗、金融、教育、法律和科研等领域的应用，以及模型适应和优化的技术，如指令微调、对齐微调、参数高效微调和量化等。最后，研报分析了LLMs的能力和局限性，并提出了未来研究方向，包括更高效的训练方法、外部知识源的整合以及与人类价值观的 alignment。

Andrea MatarazzoExpedia GroupItalya.matarazzo@gmail.comRiccardo TorloneRoma Tre UniversityItalyriccardo.torlone@uniroma3.it Abstract The rapid advancement of artificial intelligence, particularly with the development ofLarge Language Models (LLMs) built on the transformer architecture, has redefined thecapabilities of natural language processing. These models now exhibit remarkable perfor-mance across various language-related tasks, such as text generation, question answering,translation, and summarization, often rivaling human-like comprehension. More intrigu-ingly, LLMs have demonstrated emergent abilities extending beyond their core functions,showing proficiency in tasks like commonsense reasoning, code generation, and arithmetic.This survey paper explores the foundational components, scaling mechanisms, and architectural strategies that drive these capabilities. Emphasizing models like GPT andLLaMA, we analyze the impact of exponential data and computational growth on LLMperformance, while also addressing the trade-offs associated with scaling.We also ex-amine LLM applications across sectors, such as healthcare, finance, education, and law,highlighting their adaptability and potential to solve domain-specific challenges.Central to this work are the questions of how LLMs generalize across diverse tasks, exhibit planning, and reasoning abilities, and whether these emergent abilities can besystematically elicited or enhanced. In particular, we provide some insights into the CoT(Chain of Thought) and PoT (Plan of Thought) abilities within LLMs, focusing on howpre-training data influences their emergence.Additionally, we investigate LLM-moduloframeworks that integrate external systems, allowing LLMs to handle complex, dynamictasks. By analyzing these factors, this paper aims to foster the ongoing discussion on thecapabilities and limits of LLMs, promoting their responsible development and applicationin novel and increasingly complex environments. Contents 1Introduction41.1Motivations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41.2Goals of the paper. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41.3Content and organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52Large Language Models62.1Definition and Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .62.2Scaling Law. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .102.3Prominent Model Families. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .122.3.1BERT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .122.3.2T5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .142.3.3GPT Series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .152.3.4Llama. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .192.3.5Gemma. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .232.3.6Claude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .252.4Specialized Large Language Models . . . . . . . . . . . . . . . . . . . . . . . . .262.4.1LLMs in Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .272.4.2LLMs in Finance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .282.4.3LLMs in Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .382.4.4LLMs in Law. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .392.4.5LLMs in Scientific Research. . . . . . . . . . . . . . . . . . . . . . . . .403Foundations of Large Language Models423.1Pre-training. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .423.1.1Unsupervised pre-training. . . . . . . . . . . . . . . . . . . . . . . . . .433.1.2Supervised pre-training . . . . . . . . . . . . . . . . . . . . . . . . . . . .433.1.3Semi-supervised pre-training . . . . . . . . . . . . . . . . . . . . . . . . .443.2Data sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .463.2.1General Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .463.2.2Specialized Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .473.2.3Commonly-used data sources.. . . . . . . . . . . . . . . . . . . . . . . .483.3Data preprocessing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .503.3.1Quality Filtering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .503.3.2Deduplication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .513.3.3Privacy reduction.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .513.3.4Tokenization.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .513.4LLM Adaptation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .533.4.1Instruction Tuning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .533.4.2Alignment Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .60

点击免费查看完整报告

大型语言模型的能力与局限性调查

大型语言模型研报总结

你可能感兴趣

DeepSeek-R1：通过强化学习激励大型语言模型的推理能力

基于大型语言模型的智能体的兴起与发展

ELEPHANT：大型语言模型中社会式谄媚的测量与理解

Swarm-GPT：将大型语言模型与机器人编排设计的安全运动规划相结合（中文）

大模型如何判决？从生成到判决：大型语言模型作为裁判的机遇与挑战

负责任的大型语言模型的综述：固有风险、恶意使用与缓解策略

大型语言模型的知识蒸馏与数据集蒸馏：新兴趋势、挑战与未来方向

大象：测量与理解大型语言模型中的社会谄媚现象

大型语言模型（LLM）安全风险、案例与防御策略

多模态情感识别与大型语言模型