您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。[劳伦斯利弗莫尔国家实验室&MosaicML]:计算高效深度学习:算法趋势与机遇 - 发现报告

计算高效深度学习:算法趋势与机遇

AI智能总结
查看更多
计算高效深度学习:算法趋势与机遇

∗ limited financial resources like academics, students, and researchers (particularly those from emerging economies)[Ahmed and Wahed 2020]. We discuss these critical issues in more detail in Appendix A.Given the unsustainable growth of its computational burden, progress with DL demands more compute-efficienttraining methods. A natural direction is to eliminate algorithmic inefficiencies in the learning process to reduce the time,cost, energy, and carbon footprint of DL training. SuchAlgorithmically-Efficient Deep Learningmethods could changethe training process in a variety of ways that include: altering the data or the order in which samples are presented to themodel; tweaking the structure of the model; and changing the optimization algorithm. These algorithmic improvementsare critical to moving towards estimated lower bounds on the required computational burden of effective DL training,which are greatly exceeded by the burden induced by current practices [Thompson et al.2020]. Further, these algorithmicgains compound with software and hardware acceleration techniques [Hernandez and Brown 2020]. Thus, we believealgorithmically-efficient DL presents an enormous opportunity to increase the benefits of DL and reduce its costs.While this view is supported by the recent surge in algorithmic efficiency papers, these papers also suggest thatresearch and application of algorithmic efficiency methods are hindered by fragmentation. Disparate metrics are usedto quantify efficiency, which produces inconsistent rankings of speedup methods. Evaluations are performed on narrowor poorly characterized environments, which results in incorrect or overly-broad conclusions. Algorithmic efficiencymethods are discussed in the absence of a taxonomy that reflects their breadth and relationships, which makes it hardto understand how to traverse the speedup landscape to combine different methods and develop new ones.Accordingly, our central contributions are an organization of the algorithmic-efficiency literature (via a taxonomyand survey inspired by [Von Rueden et al.2019]) and a technical characterization of the practical issues affectingthe reporting and achievement of speedups (via guides for evaluation and practice). Throughout, our discussionemphasizes the critical intersection of these two thrusts: e.g., whether an algorithmic efficiency method leads to anactual speedup indeed depends on the interaction of the method (understandable via our taxonomy) and the computeplatform (understandable via our practitioner’s guide). Our contributions are summarized as follows:•Formalizing Speedup:We review DNN efficiency metrics, then formalize the algorithmic speedup problem.•Taxonomy and Survey:We classify over 200 papers via 5 speedup actions (the 5Rs) that apply to 3 training-pipeline components (see Tables 1 and 3). The taxonomy facilitates selection of methods for practitioners,digestion of the literature for readers, and identification of opportunities for researchers.•Best Evaluation Practices:We identify evaluation pitfalls common in the literature and accordingly presentbest evaluation practices to enable comprehensive, fair, and reliable comparisons of various speedup techniques.•Practitioner’s Guide:We discuss compute-platform bottlenecks that affect speedup-method effectiveness. Wesuggest appropriate methods and mitigations based on the location of the bottlenecks in the training pipeline.With these contributions, we hope to improve the research and application of algorithmic efficiency, a criticalpiece of the compute-efficient deep learning needed to overcome the economic, environmental, and inclusion-relatedroadblocks faced by existing research. This paper is organized mainly into four parts: Section 2 provides an overview ofDNN training and efficiency metrics along with a formalization of the algorithmic speedup problem. Section 3 usesbroadly applicable building blocks of speedup methods and the training pipeline components they affect to developour taxonomy. Section 4 presents a comprehensive categorization of the speedup literature based on our taxonomyand discusses research opportunities and challenges. Sections 5 and 6 discuss best evaluation practices for comparingdifferent approaches and our practical recommendations for choosing suitable speedup methods, respectively. Finally,Section 7 concludes and presents open questions in the algorithmic-efficiency area.2 2COMPUTE-EFFICIENT TRAINING: OVERVIEW, METRICS, AND DEFINITIONIn this section, we first provide a brief overview of the Deep Neural Network (DNN) training process. Next, we mentionvarious metrics that quantify training efficiency and discuss their pros and cons. Finally, we formally define algorithmicspeedup for DNN training.2.1Overview of DNN Training ProcessAt a high level, the goal of DL is to learn a function that can map inputs to outputs to accomplish a certain task. Thisfunction, referred to as the model, is chosen from a parametric family called the