行业研究公司研究宏观策略财报招股书会议纪要中央经济工作会议低空经济 DeepSeek AIGC 大模型

计算高效深度学习：算法趋势与机遇

信息技术2022-10-13劳伦斯利弗莫尔国家实验室&MosaicML朝***

AI智能总结

核心观点

深度学习近年来取得了巨大进步，但训练神经网络的经济和环境成本正在变得不可持续。为了解决这个问题，研究者们致力于开发算法效率方法，通过改变训练程序的语义来降低训练成本，而不是在硬件或实现层面。

关键数据

大型 AI 模型的参数数量正在快速增长，例如，Swin Transformer-V2 拥有 30 亿个参数，PaLM 拥有 5400 亿个参数，Persia 拥有 100 万亿个参数。
GPT-3 的训练成本估计为 165 万美元，使用 Google v3 TPUs 进行训练。
预计到 2026 年，最大 AI 模型的训练成本将超过美国国内生产总值。

研究结论

算法效率方法可以通过改变数据、模型结构或优化算法来降低训练成本。
研究者们提出了一个分类法，将算法效率方法分为五个动作（移除、限制、重新排序、替换和重构），并应用于训练管道的三个组件（功能、数据和优化）。
为了实现公平、全面和可靠的比较，研究者们提出了最佳评估实践。
研究者们讨论了训练管道中的常见瓶颈，并提出了相应的分类法缓解策略。
研究者们强调了未解决的研究挑战和有前景的未来方向。

未来方向

将算法效率方法应用于大型语言模型、生成模型、强化学习等问题。
探索针对更多训练管道组件的方法。
设计新的排名指标来全面比较算法效率方法。
建立一个综合的模型动物园，以避免重复工作。
进行算法-硬件协同设计，以使硬件不友好的方法能够实现训练效率的提高。
建立一个综合的基准测试排行榜，以全面、公平和可靠地评估算法效率方法。
促进不同研究社区之间的思想交流，例如深度学习、黑盒和多目标优化、近似计算和硬件设计。

∗ limited financial resources like academics, students, and researchers (particularly those from emerging economies)[Ahmed and Wahed 2020]. We discuss these critical issues in more detail in Appendix A.Given the unsustainable growth of its computational burden, progress with DL demands more compute-efficienttraining methods. A natural direction is to eliminate algorithmic inefficiencies in the learning process to reduce the time,cost, energy, and carbon footprint of DL training. SuchAlgorithmically-Efficient Deep Learningmethods could changethe training process in a variety of ways that include: altering the data or the order in which samples are presented to themodel; tweaking the structure of the model; and changing the optimization algorithm. These algorithmic improvementsare critical to moving towards estimated lower bounds on the required computational burden of effective DL training,which are greatly exceeded by the burden induced by current practices [Thompson et al.2020]. Further, these algorithmicgains compound with software and hardware acceleration techniques [Hernandez and Brown 2020]. Thus, we believealgorithmically-efficient DL presents an enormous opportunity to increase the benefits of DL and reduce its costs.While this view is supported by the recent surge in algorithmic efficiency papers, these papers also suggest thatresearch and application of algorithmic efficiency methods are hindered by fragmentation. Disparate metrics are usedto quantify efficiency, which produces inconsistent rankings of speedup methods. Evaluations are performed on narrowor poorly characterized environments, which results in incorrect or overly-broad conclusions. Algorithmic efficiencymethods are discussed in the absence of a taxonomy that reflects their breadth and relationships, which makes it hardto understand how to traverse the speedup landscape to combine different methods and develop new ones.Accordingly, our central contributions are an organization of the algorithmic-efficiency literature (via a taxonomyand survey inspired by [Von Rueden et al.2019]) and a technical characterization of the practical issues affectingthe reporting and achievement of speedups (via guides for evaluation and practice). Throughout, our discussionemphasizes the critical intersection of these two thrusts: e.g., whether an algorithmic efficiency method leads to anactual speedup indeed depends on the interaction of the method (understandable via our taxonomy) and the computeplatform (understandable via our practitioner’s guide). Our contributions are summarized as follows:•Formalizing Speedup:We review DNN efficiency metrics, then formalize the algorithmic speedup problem.•Taxonomy and Survey:We classify over 200 papers via 5 speedup actions (the 5Rs) that apply to 3 training-pipeline components (see Tables 1 and 3). The taxonomy facilitates selection of methods for practitioners,digestion of the literature for readers, and identification of opportunities for researchers.•Best Evaluation Practices:We identify evaluation pitfalls common in the literature and accordingly presentbest evaluation practices to enable comprehensive, fair, and reliable comparisons of various speedup techniques.•Practitioner’s Guide:We discuss compute-platform bottlenecks that affect speedup-method effectiveness. Wesuggest appropriate methods and mitigations based on the location of the bottlenecks in the training pipeline.With these contributions, we hope to improve the research and application of algorithmic efficiency, a criticalpiece of the compute-efficient deep learning needed to overcome the economic, environmental, and inclusion-relatedroadblocks faced by existing research. This paper is organized mainly into four parts: Section 2 provides an overview ofDNN training and efficiency metrics along with a formalization of the algorithmic speedup problem. Section 3 usesbroadly applicable building blocks of speedup methods and the training pipeline components they affect to developour taxonomy. Section 4 presents a comprehensive categorization of the speedup literature based on our taxonomyand discusses research opportunities and challenges. Sections 5 and 6 discuss best evaluation practices for comparingdifferent approaches and our practical recommendations for choosing suitable speedup methods, respectively. Finally,Section 7 concludes and presents open questions in the algorithmic-efficiency area.2 2COMPUTE-EFFICIENT TRAINING: OVERVIEW, METRICS, AND DEFINITIONIn this section, we first provide a brief overview of the Deep Neural Network (DNN) training process. Next, we mentionvarious metrics that quantify training efficiency and discuss their pros and cons. Finally, we formally define algorithmicspeedup for DNN training.2.1Overview of DNN Training ProcessAt a high level, the goal of DL is to learn a function that can map inputs to outputs to accomplish a certain task. Thisfunction, referred to as the model, is chosen from a parametric family called the

点击免费查看完整报告