您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。 [AFL]:AI数据中心:扩展与演进 - 发现报告

AI数据中心:扩展与演进

信息技术 2024-12-20 AFL 邓轶韬
报告封面

2Copyright © 2024 AFL. All rights reserved.Executive SummaryThis document provides in-depth commentary on the technical building blocks enabling scaling up and scaling out inmodern AI data centers. By highlighting key industry developments and the continuous evolution of advanced scalingtechniques, the authors aim to highlight the need for industry-wide, collaborative approaches to critical aspects ofscaling, including AI hardware innovation, modular infrastructure planning, and sophisticated cooling methods. Gain a comprehensive overview of ArtificialIntelligence (AI), Machine Learning (ML), andLarge Language Models (LLMs). This sectionexplains fundamental concepts, includingmodels, training, and inference.Delve into modern strategies for scalingAI data centers, focusing on bothscaling up and scaling out.Explore emerging trends such as segmentedmodels, less frequent synchronization, andextended distributed systems. This section alsodiscusses the growing need for medium and long-haul links in data center interconnect (DCI).Explore significant AI milestones, such as theemergence of the transformative Transformermodel. This section also highlights trendstowards larger AI models and increasedcomputational power.Learn about innovations in semiconductortechnology, chiplets, and packaging techniques.Gain insights into high-bandwidth networkingand advanced cooling systems.Introduction to AI and MLWhat you will learn:Scaling AI InfrastructureFuture Trends in AIEvolution of AI Since 2017Advances in AI Hardware0103050204Alan KeizerSenior Technology Advisor, AFLBen AthertonTechnical Author, AFLWritten by “Generative AI has emerged and grown at ascale and speed that has surprised us all. It isbuilt on a platform of multiple technologies, allof which have also advanced at scale and speed.This is an extraordinary technology story.”Alan KeizerSenior Technology Advisor, AFLCopyright © 2024 AFL. All rights reserved. 4Copyright © 2024 AFL. All rights reserved.ModelsIn the context of ML, a model represents groups of algorithms trained to identify patterns and predict coherentrelationships within new, unseen data. For example, model purposes could include predicting the weather,recognizing images, and providing hyper-personalized e-commerce experiences based on user behavior.The following models represent early model types (for an exploration of more recent models, please see thefollowing section, titled “Evolution of AI since 2017”):Supervised Learning ModelsSupervised learning models learn from approved examples. For instance, a supervised learning model canvisually identify an object based on images containing similar representations of the same object.Unsupervised Learning ModelsUnsupervised learning models search for hidden patterns or groupings in unlabeled data. Techniquesinclude clustering (grouping similar data points) and reducing data complexity (simplifying data for moreefficient analysis).Reinforcement Learning ModelsReinforcement learning models learn through interactions with the environment – feedback registers asrewards or penalties. This type of model finds applications in areas such as gaming and robotics.Fundamental Concepts: Models, Training, and InferenceML processes aim to develop models that can perform accurate inference, make logical decisions, and exhibithuman-like intelligence. The training phase involves preparing selected data and refining responses for optimalperformance. During the inference phase, trained models analyze new data, apply refined pattern recognition, andauto-generate logical responses. This section gives a foundational understanding of models, training, and inference.Artificial Intelligence (AI) references machines or software designed to perform tasks typically requiring humanintelligence (e.g., understanding natural language, visual perception, speech recognition, language translation, learning,and problem-solving).Machine Learning (ML) trains algorithms to infer meaning and provide accurate human-like responses to uniqueprompts. Deep Learning (DL) is ML without human intervention. DL uses algorithms called Artificial Neural Networks(ANN), which process input stimuli in multiple stages and can discern relationships within complex data sets. LargeLanguage Models (LLM) are specialized DL models dealing with language.DL algorithms can process any digitized information that has relationships among its elements. For example, LLMs cangenerate human language responses to queries or prompts (e.g., GPT-4), and can also work in some non-languagespaces such as images and coding.Introduction to Artificial Intelligence (AI),Machine Learning (ML), and LargeLanguage Models (LLMs) TrainingThe training phase teaches models to make accurate predictions. This phase involves exposing the model to apredetermined dataset before requiring the model to make predictions. Parameter adjustments follow, helpingminimize errors.Data CollectionStakeholders such as ML engineers and datascientists must sour