您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。 [Expedia Group&罗马第三大学]:大型语言模型的能力与局限性调查 - 发现报告

大型语言模型的能力与局限性调查

报告封面

Andrea MatarazzoExpedia GroupItalya.matarazzo@gmail.comRiccardo TorloneRoma Tre UniversityItalyriccardo.torlone@uniroma3.it Abstract The rapid advancement of artificial intelligence, particularly with the development ofLarge Language Models (LLMs) built on the transformer architecture, has redefined thecapabilities of natural language processing. These models now exhibit remarkable perfor-mance across various language-related tasks, such as text generation, question answering,translation, and summarization, often rivaling human-like comprehension. More intrigu-ingly, LLMs have demonstrated emergent abilities extending beyond their core functions,showing proficiency in tasks like commonsense reasoning, code generation, and arithmetic.This survey paper explores the foundational components, scaling mechanisms, and architectural strategies that drive these capabilities. Emphasizing models like GPT andLLaMA, we analyze the impact of exponential data and computational growth on LLMperformance, while also addressing the trade-offs associated with scaling.We also ex-amine LLM applications across sectors, such as healthcare, finance, education, and law,highlighting their adaptability and potential to solve domain-specific challenges.Central to this work are the questions of how LLMs generalize across diverse tasks, exhibit planning, and reasoning abilities, and whether these emergent abilities can besystematically elicited or enhanced. In particular, we provide some insights into the CoT(Chain of Thought) and PoT (Plan of Thought) abilities within LLMs, focusing on howpre-training data influences their emergence.Additionally, we investigate LLM-moduloframeworks that integrate external systems, allowing LLMs to handle complex, dynamictasks. By analyzing these factors, this paper aims to foster the ongoing discussion on thecapabilities and limits of LLMs, promoting their responsible development and applicationin novel and increasingly complex environments. Contents 1Introduction41.1Motivations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41.2Goals of the paper. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41.3Content and organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52Large Language Models62.1Definition and Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .62.2Scaling Law. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .102.3Prominent Model Families. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .122.3.1BERT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .122.3.2T5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .142.3.3GPT Series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .152.3.4Llama. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .192.3.5Gemma. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .232.3.6Claude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .252.4Specialized Large Language Models . . . . . . . . . . . . . . . . . . . . . . . . .262.4.1LLMs in Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .272.4.2LLMs in Finance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .282.4.3LLMs in Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .382.4.4LLMs in Law. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .392.4.5LLMs in Scientific Research. . . . . . . . . . . . . . . . . . . . . . . . .403Foundations of Large Language Models423.1Pre-training. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .423.1.1Unsupervised pre-training. . . . . . . . . . . . . . . . . . . . . . . . . .433.1.2Supervised pre-training . . . . . . . . . . . . . . . . . . . . . . . . . . . .433.1.3Semi-supervised pre-training . . . . . . . . . . . . . . . . . . . . . . . . .443.2Data sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .463.2.1General Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .463.2.2Specialized Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .473.2.3Commonly-used data sources.. . . . . . . . . . . . . . . . . . . . . . . .483.3Data preprocessing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .503.3.1Quality Filtering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .503.3.2Deduplication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .513.3.3Privacy reduction.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .513.3.4Tokenization.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .513.4LLM Adaptation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .533.4.1Instruction Tuning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .533.4.2Alignment Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .60