您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。[阿里巴巴]:UPN512技术架构白皮书v1.0 - 发现报告

UPN512技术架构白皮书v1.0

信息技术2025-10-11阿里巴巴梅***
AI智能总结
查看更多
UPN512技术架构白皮书v1.0

Alibaba Cloud Network Infra Team Table of contents 1. Terminology2. Trends in AI Infrastructure Networking3. Evolution and Challenges of xPU Scale-up Netwo…4. Alibaba Cloud UPN512 Architecture Overview5. UPN512 System Design and Key Components 5.1 System architecture5.1.1 AI Rack—tightly coupled copper interconnect5.1.2 UPN512—single-tier optical, decoupled system5.1.2.1 All-optical interconnect5.1.2.2 Single-tier, 1K-scale domain5.1.2.3 Decoupled design 5.2.1 Pluggable optics5.2.2 High-Density Bandwidth Optical Interconnect Solutions5.2.3 LPO vs. NPO: Use Case and Solution SelectionConclusion: LPO and NPO as Complementary Options for UPN5.2.4 LPO/NPO Cost Analysis5.2.5 Interconnect stability 1.Terminology 2.TrendsinAIInfrastructureNetworking In recent years, as artificial intelligence (AI) has surged, the compute and memorydemands of large-scale model training and inference have grown exponentially. To boostcomputation throughput, shorten training time, and improve inference efficiency, AIclusters scales via high-performance networks, marching from tens of thousands tohundreds of thousands of accelerators (xPUs). To achieve efficient training and inference,the industry typically employs multiple parallelization strategies that drive thousands totens of thousands of xPUs to exchange data and collaborate, which relies on high-performance network forwarding. Looking across the evolution of AI infrastructure, severaltrends are imposing new requirements on the network: FromDensetoMoEarchitecture.After the initial phase of growth in large languagemodels, the drive to increase parameter capacity and efficiency while loweringcomputational cost has made Mixture-of-Experts (MoE) architectures a growing trend overDense models. MoE partitions a model into multiple expert subnetworks and uses agating mechanism to dynamically route inputs to specific experts. Multiple expertsprocess different data shards in parallel, then the system selects the most suitable expertoutput based on the input's features, improving performance while constraining computecost. From the network perspective, MoE typically leverages Expert Parallelism (EP),which demands ultra-high-bandwidth and ultra-low-latency. Because larger EP domainsbring higher compute efficiency, expanding the EP communication domain is becoming akey direction for network evolution. Frompre-trainingtounifiedtraining-inference.Workloads in AI clusters are shiftingfrom pre-training-only toward unified training–inference: the same cluster runs offlinetraining and reinforcement learning (RL), as well as online inference services. On theinference side, distributed efficiency optimizations are emerging, including PD separation(prefill–decode), AF separation (attention–FFN), and large-EP inference. From a networkperspective, online and offline traffic co-exist; different parallel modes and the separationof workloads with different compute intensities make the communication model morecomplex and raise the bar for the network architecture underpinning unified training–inference. ScalingupxPUstoincreasesupernodecomputationpower.To meet the demand forever-growing compute, interconnect technologies are advancing rapidly. Using ultra-high-bandwidth, ultra-low-latency networks to build clustered supernodes has become a majortrend. For example, NVIDIA's GPU scale-up domain has evolved from 8-card air-cooled to 72-card liquid-cooled systems; Huawei has announced a 384-NPU supernode built withits UB network. Expanding cluster supernodes via ultra-high-bandwidth, ultra-low-latency scale-upinterconnects is a key direction for the evolution of AI compute infrastructure. Thiswhitepaper explores the evolution and challenges of xPU scale-up systems and presentsAlibaba Cloud's UPN (Ultra Performance Network) architecture to build future xPU scale-up systems that arelarge-scale,high-performance,highlyreliable,cost-effective,andextensible. 3.EvolutionandChallengesofxPUScale-upNetworks As noted above, on one hand, ultra-high-bandwidth, ultra-low-latency scale-upinterconnects effectively raise clustered compute in xPU supernodes; on the other hand,the trend toward MoE architectures drives the need to expand the scale-up interconnectdomain. As sparse MoE models displace dense models, the number of experts is alsogrowing. The earliest open-sourced MoE model Mixtral-8x7B had 8 experts; this year(2025)'s mainstream open models such as Qwen3 have 128 experts, DeepSeek-V3 has256, and Kimi K2 has 384. In MoE, using large EP (more xPUs participating in expertparallelism) to optimize training and inference efficiency is a key direction for computeefficiency. Consequently, mainstream compute systems are evolving to larger scale-upinterconnect domains to meet EP's high-bandwidth, large-scale all-to-all demands.NVIDIA has announced NVL72 and roadmapped NVL144 and NVL576; Huawei hasannounced a 384-NPU scale-up supernode (CM384); AMD has stated its next-generationscale-up interconnect domain will re