行业研究公司研究宏观策略财报招股书会议纪要 Token 低空经济十五五 AIGC 大模型

AI数据中心：扩展与演进

信息技术 2024-12-20 AFL 邓轶韬

核心观点

人工智能（AI）技术，特别是大型语言模型（LLM）和深度学习，正在经历快速发展，推动着AI数据中心基础设施的快速扩展。
AI模型的规模和复杂性不断增加，对计算能力、存储和网络提出了更高的要求。
AI数据中心的扩展需要采用“向上扩展”和“向外扩展”两种策略，并需要先进的网络和冷却技术。

关键数据

2023年，全球AI市场规模约为1966.3亿美元，预计到2030年将以36.6%的复合年增长率增长。
AI模型的参数规模从早期的数百万级增长到现在的数万亿级，例如GPT-4模型估计包含1.8万亿个参数。
训练AI模型所需的计算能力呈指数级增长，例如训练一个万亿参数的模型需要约70,000个NVIDIA H100等效加速器。

研究结论

AI硬件创新，包括芯片、系统、封装技术，对于支持AI模型的训练和推理至关重要。
高带宽、低延迟的网络对于AI数据中心的扩展至关重要，例如NVLink、NeuraLink、PCIe、InfiniBand和RDMA等技术。
AI数据中心的冷却是一个重大挑战，需要采用先进的冷却技术，例如直接液体冷却和浸没式冷却。
AI领域的竞争激烈，推动了技术创新和投资，例如大型科技公司向AI初创公司投资数十亿美元。
未来AI数据中心的发展趋势包括模型分割、非同步训练算法和分布式训练等。

2Copyright © 2024 AFL. All rights reserved.Executive SummaryThis document provides in-depth commentary on the technical building blocks enabling scaling up and scaling out inmodern AI data centers. By highlighting key industry developments and the continuous evolution of advanced scalingtechniques, the authors aim to highlight the need for industry-wide, collaborative approaches to critical aspects ofscaling, including AI hardware innovation, modular infrastructure planning, and sophisticated cooling methods. Gain a comprehensive overview of ArtificialIntelligence (AI), Machine Learning (ML), andLarge Language Models (LLMs). This sectionexplains fundamental concepts, includingmodels, training, and inference.Delve into modern strategies for scalingAI data centers, focusing on bothscaling up and scaling out.Explore emerging trends such as segmentedmodels, less frequent synchronization, andextended distributed systems. This section alsodiscusses the growing need for medium and long-haul links in data center interconnect (DCI).Explore significant AI milestones, such as theemergence of the transformative Transformermodel. This section also highlights trendstowards larger AI models and increasedcomputational power.Learn about innovations in semiconductortechnology, chiplets, and packaging techniques.Gain insights into high-bandwidth networkingand advanced cooling systems.Introduction to AI and MLWhat you will learn:Scaling AI InfrastructureFuture Trends in AIEvolution of AI Since 2017Advances in AI Hardware0103050204Alan KeizerSenior Technology Advisor, AFLBen AthertonTechnical Author, AFLWritten by “Generative AI has emerged and grown at ascale and speed that has surprised us all. It isbuilt on a platform of multiple technologies, allof which have also advanced at scale and speed.This is an extraordinary technology story.”Alan KeizerSenior Technology Advisor, AFLCopyright © 2024 AFL. All rights reserved. 4Copyright © 2024 AFL. All rights reserved.ModelsIn the context of ML, a model represents groups of algorithms trained to identify patterns and predict coherentrelationships within new, unseen data. For example, model purposes could include predicting the weather,recognizing images, and providing hyper-personalized e-commerce experiences based on user behavior.The following models represent early model types (for an exploration of more recent models, please see thefollowing section, titled “Evolution of AI since 2017”):Supervised Learning ModelsSupervised learning models learn from approved examples. For instance, a supervised learning model canvisually identify an object based on images containing similar representations of the same object.Unsupervised Learning ModelsUnsupervised learning models search for hidden patterns or groupings in unlabeled data. Techniquesinclude clustering (grouping similar data points) and reducing data complexity (simplifying data for moreefficient analysis).Reinforcement Learning ModelsReinforcement learning models learn through interactions with the environment – feedback registers asrewards or penalties. This type of model finds applications in areas such as gaming and robotics.Fundamental Concepts: Models, Training, and InferenceML processes aim to develop models that can perform accurate inference, make logical decisions, and exhibithuman-like intelligence. The training phase involves preparing selected data and refining responses for optimalperformance. During the inference phase, trained models analyze new data, apply refined pattern recognition, andauto-generate logical responses. This section gives a foundational understanding of models, training, and inference.Artificial Intelligence (AI) references machines or software designed to perform tasks typically requiring humanintelligence (e.g., understanding natural language, visual perception, speech recognition, language translation, learning,and problem-solving).Machine Learning (ML) trains algorithms to infer meaning and provide accurate human-like responses to uniqueprompts. Deep Learning (DL) is ML without human intervention. DL uses algorithms called Artificial Neural Networks(ANN), which process input stimuli in multiple stages and can discern relationships within complex data sets. LargeLanguage Models (LLM) are specialized DL models dealing with language.DL algorithms can process any digitized information that has relationships among its elements. For example, LLMs cangenerate human language responses to queries or prompts (e.g., GPT-4), and can also work in some non-languagespaces such as images and coding.Introduction to Artificial Intelligence (AI),Machine Learning (ML), and LargeLanguage Models (LLMs) TrainingThe training phase teaches models to make accurate predictions. This phase involves exposing the model to apredetermined dataset before requiring the model to make predictions. Parameter adjustments follow, helpingminimize errors.Data CollectionStakeholders such as ML engineers and datascientists must sour

点击免费查看完整报告

AI数据中心：扩展与演进

核心观点

关键数据

研究结论

你可能感兴趣

规模扩展与架构演进白皮书

ict能源演进：电信、数据中心与人工智能

2025年ICT行业能源演进——电信、数据中心与人工智能能耗分析及未来展望报告

AI持续演进，服务器与交换机景气提升

AI与新科技：通信投资策略演进

数字化、信息化、AI技术演进与产业变革新图景

国家发改委等两部门印发政务领域AI大模型部署应用指引，机构称大模型的赋能下软件业正加速演进，这家公司与华为合作推出的WPS鸿蒙版已在全端流畅运行

【风口研报·洞察】政策加码+AI应用与先进制程演进驱动，这个半导体环节具备“消耗属性+复购特征”，分析师看好在本轮行业景气上行中更具业绩确定性；依据景气选股的有效性增强-20260413

2025年数据现状：AI在媒体广告活动中的当下、近期与未来演进

商业模式未来探索：产业回流、新商业生态与AI角色的动态演进