您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。 [SemiAnalysis]:SemiAnalysis-xAI 巨像 2 号——全球首个吉瓦级数据中心,独特强化学习方法论及融资计划——半导体分析-20250917【23页】 - 发现报告

SemiAnalysis-xAI 巨像 2 号——全球首个吉瓦级数据中心,独特强化学习方法论及融资计划——半导体分析-20250917【23页】

电子设备 2025-09-18 SemiAnalysis 一切如初
报告封面

September 16, 20252025年9⽉16⽇ xAI's Colossus 2 - First Gigawatt Datacenter In The World, Unique RL Methodology, Capital Raise // On Site Turbines,Mississippi Expansion, Solaris Energy, Can xAI afford it?, Middle East Funding, Tesla, Talent Exodus, API revenue,Consumer Growth, RL Environment xAI的Colossus 2——全球⾸个千兆瓦数据中⼼、独特强化学习⽅法论、融资进展//现场涡轮机、密⻄⻄⽐州扩建、Solaris能源、xAI能否负担?中东资⾦、特斯拉、⼈才流失、API收⼊、消费者增⻓、强化学习环境 10 minutes10分钟1 comment1条评论By Jeremie Eliahou Ontiveros, Dylan Patel, Wei Zhou, Maya Barkin and AJ Kourabi 作者:Jeremie Eliahou Ontiveros、Dylan Patel、Wei Zhou、Maya Barkin和AJ Kourabi Much has been written about xAI’s Colossus 1. The Memphis build belongs in the history books: the largest AI trainingcluster, erected from scratch in 122 days. With roughlyandNVL72, it remains,today, the largest fully operational, single-coherent cluster (setting apart Google, master of multi-datacenter-training). 关于xAI的Colossus 1已有诸多报道。孟菲斯建造⼯程堪称史册传奇:这座最⼤规模AI训练集群从零开始仅⽤122天建成。配备约和台NVL72设备,迄今仍是全球最⼤规模完全运营的单⼀连贯集群(排除擅⻓多数据中⼼训练的⾕歌)。 However, Colossus 1’s ~300 MW looks modest next to the Gigawatt-scale clusters under construction by OpenAI, Meta andAnthropic. Their hyperscaler partners are happy to leverage their balance sheet and win the market by throwing dollars atit. 然⽽与OpenAI、Meta和Anthropic正在建设的千兆瓦级集群相⽐,Colossus 1约300兆瓦的规模显得相形⻅绌。其超⼤规模合作伙伴乐于利⽤资产负债表优势,通过资本投⼊抢占市场。 Was xAI’s prowess a one-time wonder? Today we will publicize some data from our industry leading datacenter model overthe last year that is accessible to clients. This is the our same proprietary data that called the Oracle deals many months xAI的壮举是否昙花⼀现?今⽇我们将公布过去⼀年⾯向客户开放的⾏业领先数据中⼼模型部分数据。这些独家数据曾提前数⽉预测甲⻣⽂交易。 Source: SemiAnalysis Datacenter Industry Model - note: there is a lag between Datacenter operational and GPUsoperational - Google and exact figures available in model 来源:SemiAnalysis数据中⼼⾏业模型——注:数据中⼼运营与GPU运营存在时滞,⾕歌及具体数据详⻅模型 Short answer: no. xAI is still squarely in the frontier-AI race and is positioned to leapfrog most rivals again on compute. Byour estimates, its total datacenter capacity for a single training cluster will surpass Meta Superintelligence and Anthropic byQ3 2025. The datacenter capacity will be ready for the GPUs to be moved in to create the largest single datacenter in theworld, yet again. xAI has to raise the capital for those GPUs, but they have the allocations from Nvidia to have it fullytraining large scale models early next year. 简短回答:不会。xAI仍处于前沿AI竞赛的核⼼位置,并有望在算⼒⽅⾯再次超越⼤多数竞争对⼿。根据我们的估算,到2025年第三季度,其单个训练集群的数据中⼼总容量将超过Meta超级智能和Anthropic。该数据中⼼容量将做好准备迎接GPU⼊驻,届时将再次建成全球最⼤的单体数据中⼼。xAI需要为这些GPU筹集资⾦,但他们已获得英伟达的供货配额,明年初就能全⾯开展⼤规模模型训练。 Elon came up with a new genius trick to beat rivals at time-to-market. Colossus 2 will be an even more impressiveachievement than xAI’s first cluster. Let’s dig in. 埃隆想出了缩短产品上市时间的新妙招来击败对⼿。Colossus 2的成就将⽐xAI⾸个集群更为惊⼈。让我们深⼊分析。 The first half of this report will dig into the Colossus 2 prowess. The second half will discuss Grok models, our mid-to-longterm thoughts on xAI, and the unique RL method xAI is using that may lead them to leapfrog OpenAI, Anthropic, andGoogle. 本报告前半部分将剖析Colossus 2的强⼤性能,后半部分将探讨Grok模型、我们对xAI的中⻓期看法,以及xAI正在使⽤的可能助其超越OpenAI/Anthropic/Google的独特强化学习⽅法。 SemiAnalysis Is HiringSemiAnalysis正在招聘 We are seeking a highly motivated & skilled Member of Technical Staff to join our growing special projects engineeringteam. You will play a crucial role in developing industry leading gpu cloud benchmarks & evaluation framework. Our gpucloud evaluation frameworks is endorsed by many tier 1 & 2 frontier labs. You may be a good fit if you have the following experience: 我们正在寻找⼀位积极主动且技术娴熟的技术团队成员,加⼊我们不断壮⼤的特殊项⽬⼯程团队。您将在开发⾏业领先的GPU云基准测试与评估框架中发挥关键作⽤。我们的GPU云评估框架已获得多家⼀线及⼆线前沿实验室的认可。若您具备以下经验,可能正是合适⼈选: Demonstrated experience in ML frameworks such as PyTorch or JAX through professional experience, personal projects, orpersonal Substack blogs 通过专业经验、个⼈项⽬或个⼈Substack博客展示出在PyTorch或JAX等机器学习框架⽅⾯的实践能⼒ 1-2 years using GPU or TPU clusters and/or running a multi-tenant GPU cluster 1-2年使⽤GPU或TPU集群及/或运⾏多租户GPU集群的经验 Past experience working at a hyperscaler or a GPU cloud (preferred) 具有超⼤规模云服务商或GPU云平台⼯作经验者优先 Solid understanding of SLURM, Kubernetes, NCCL & GPU Cloud industry 熟练掌握SLURM、Kubernetes、NCCL及GPU云⾏业技术 Strong research skills and the ability to synthesize information from various sources to draw insights 出⾊的研究能⼒,擅⻓综合多⽅信息提炼关键洞察 Compensation is competitive & as part of the interview process, you’ll complete a paid coding challenge designed to reflecttypical daily tasks at SemiAnalysis 薪酬待遇具有竞争⼒,⾯试流程包含有偿编程挑战,该测试旨在模拟SemiAnalysis⽇常⼯作场景 Apply Here⽴即申请 Colossus 2: from zero toin six months Colossus 2项⽬:六个⽉内从零到的突破 The Colossus 2 project was kicked off on March 7th, 2025, when xAI acquired a 1 m sqft warehouse in Memphis, and twoadjacent sites totaling Colossus 2项⽬于2025年3⽉7⽇启动,当时xAI在孟菲斯收购了100万平⽅英尺仓库及两处相邻地块 100 acres. By August 22nd, 2025, we count 119 air-cooled chillers on site, i.e. roughly 200 MW of cooling capacity. That’senough to power