行业研究公司研究宏观策略财报招股书会议纪要海南封关低空经济 DeepSeek AIGC 大模型

NVIDIA Blackwell Ultra 数据手册

2025-10-24英伟达极***

AI智能总结

核心观点与关键数据

NVIDIA 的 Blackwell Ultra GPU 及其配套平台（GB300 NVL72 和 HGX B300）专为 AI 推理时代设计，重点解决预训练、后训练和推理时扩展（长思考或推理）三大维度中的关键挑战。Blackwell Ultra GPU 提供 20 PFLOPS 的 FP4 稀疏推理性能，279 GB HBM3E 内存支持长上下文推理，800 Gbps NVIDIA ConnectX-8 网络实现数据中心无缝扩展，以及 2.5 倍于 Hopper 的注意力引擎性能。

GB300 NVL72 平台整合 72 颗 Blackwell Ultra GPU 和 36 颗 Arm®-based NVIDIA Grace CPU，采用全液冷机架级架构，支持测试时扩展推理任务。通过 NVIDIA Quantum-X800 InfiniBand 或 Spectrum-X™ Ethernet、ConnectX-8 SuperNIC 和 NVIDIA Mission Control 管理，GB300 NVL72 相比 Hopper 架构可将 AI 工厂产出性能提升 50 倍。

HGX B300 平台专为 AI 推理设计，配备 8 颗 Blackwell Ultra GPU、超过 2 TB HBM3E 内存和 1,800 GBps NVLink 互联，提供 7 倍于 Hopper 的 AI 计算能力。HGX B300 相比 Hopper 架构可将 AI 工厂产出性能提升 30 倍，并实现 2.6 倍的更大语言模型训练性能。

技术突破与性能提升

AI 推理推理：Blackwell Ultra 的 Tensor Cores 提供 2 倍注意力层加速和 1.5 倍 AI 计算浮点运算（FLOPS），结合 1.5 倍更大的 HBM3E 内存，显著提升长上下文推理性能。
高容量 HBM3E 架构：更大内存容量支持更大批处理规模和最大吞吐量性能，Blackwell Ultra 相比前代增加 1.5 倍 HBM3E 内存和 AI 计算。
NVIDIA ConnectX-8 SuperNIC：为每个 GPU 提供 800 Gbps 网络连接，支持 NVIDIA Quantum-X800 InfiniBand 或 Spectrum-X Ethernet，实现顶级远程直接内存访问（RDMA）能力。
第五代 NVIDIA NVLink：提供 130 TB/s 低延迟 GPU 通信，释放 AI 推理模型性能。
NVIDIA Grace CPU：为现代数据中心工作负载设计，提供 2 倍于当前领先服务器处理器的能效比。
NVIDIA Mission Control：自动化 AI 工厂运营，从跨 72-GPU NVLink 域的工作负载编排到设施集成，提供全栈智能基础设施弹性。
AI-Ready 企业平台：NVIDIA AI Enterprise 提供 end-to-end 软件平台，包含 NIM 微服务、AI 框架、库和工具，简化 AI 就绪平台构建，加速价值实现。

研究结论

NVIDIA Blackwell Ultra GPU 及其配套平台通过技术创新显著提升 AI 推理和训练性能，支持更大模型和更复杂任务。GB300 NVL72 和 HGX B300 平台大幅提高 AI 工厂产出效率，分别实现 50 倍和 30 倍的性能提升，推动 AI 从实验阶段向生产阶段过渡，赋能企业构建高效、可扩展的 AI 解决方案。

Built for the age of AI reasoning. Designed for AI Reasoning Performance Key Offerings AI has evolved around three fundamental scaling dimensions: pretraining, post-training, and inference-time scaling—also known as long thinking or reasoning. Thisthird dimension is critical for enabling agentic AI, where models must dynamicallyreason through complex queries during inference. Unlike traditional one-shotinference, test-time scaling can demand up to 100x more compute, as modelsevaluate multiple potential responses before selecting the most accurate outcome. >NVIDIA GB300 NVL72>NVIDIA HGX B300 The NVIDIA Blackwell Ultra GPU is designed for this new era of AI reasoning.It delivers up to 20 petaFLOPS of FP4 sparse inference performance, offeringexceptional efficiency for large-scale deployments. With 279 GB of HBM3E memory,it supports expansive KV caching and long-context inference without offloading.Blackwell Ultra also features 800 Gbps NVIDIA® ConnectX®-8 networking, doublinginterconnect bandwidth compared to NVIDIA Blackwell to enable seamless scalingacross data centers. A newly optimized attention engine delivers 2.5x fasterattention performance compared toNVIDIA Hopper™, significantly acceleratingthroughput for reasoning. NVIDIA GB300 NVL72 Powering the New Era of AI Reasoning TheNVIDIA GB300 NVL72features a fully liquid-cooled, rack-scale architecturethat integrates 72NVIDIA Blackwell Ultra GPUsand 36 Arm®-basedNVIDIA Grace™CPUsinto a single platform, purpose-built for test-time scaling inference or AIreasoning tasks.AI factoriesaccelerated by the GB300 NVL72—leveraging NVIDIAQuantum-X800 InfiniBand or Spectrum-X™Ethernet,ConnectX-8 SuperNIC™, andNVIDIA Mission Control™management—deliver up to a 50x overall increase in AIfactory output performance compared to Hopper-based platforms. End-to-End AI Acceleration at Rack Scale With 279 GB of HBM3E memory per Blackwell Ultra chip and up to 37 TB of high-speed memory per rack, coupled with 1.44 exaFLOPS of compute, and a 72-GPUunifiedNVIDIA NVLink™domain, Blackwell Ultra provides unprecedented speed andscale to support larger models while giving rise to breakthroughs in AI. Combined withCUDA-X™libraries for accelerated computing, NVIDIA accelerates the entire hardwareand software computing stack. Increase AI Factory Output Performance by 50x The frontier curve illustrates key parameters that determine AI factory token revenueoutput. The vertical axis represents GPU tokens per second (TPS) throughput in onemegawatt (MW) AI factory, while the horizontal axis quantifies user interactivity andresponsiveness as TPS for a single user. At the optimal intersection of throughputand responsiveness, GB300 NVL72 yields a 50x overall increase in AI factory outputperformance compared to the Hopper architecture for maximum token revenue. NVIDIA GB300 NVL72Key Features >36 NVIDIA Grace CPUs>72 NVIDIA Blackwell Ultra GPUs>17 TB of LPDDR5X memory witherror-correction code (ECC)>20 TB of HBM3E>Up to 37 TB of fast-accessmemory>NVLink domain: 130 terabytes persecond (TB/s) of low-latency GPUcommunication Accelerating Real-Time Video Generation by 30x GB300 NVL72 introduces cutting-edge capabilities for diffusion-based videogeneration models. A single five-second video generation sequence processes 4million tokens, requiring nearly 90 seconds to generate on NVIDIA Hopper GPUs. TheBlackwell Ultra platform enables real-time video generation from world foundationmodels, such asNVIDIA Cosmos™, providing a 30x performance improvement versusHopper. This allows the creation of customized, photo-realistic, temporally andspatially stable video for physical AI applications. NVIDIA HGX B300 Purpose-Built for AI Reasoning Key Features >8 NVIDIA Blackwell Ultra GPUs>Over 2 TB of HBM3E memory>1,800 GBps NVLink betweenGPUs via NVSwitch™chip>2.6x faster training performance(vs. H100) NVIDIA HGX™B300is built for the age of AI reasoning with enhanced compute andincreased memory. Featuring 7x more AI compute than the Hopper platform, over 2TB of HBM3E memory, and high-performance networking integration with NVIDIAConnectX-8 SuperNICs, HGX B300 delivers breakthrough performance on the mostcomplex workloads from training, agentic systems, and reasoning, to real-time videogeneration for every data center. Boost Revenue With HGX B300 AI Factory Output The frontier curve illustrates key parameters that determine AI factory token revenueoutput. The vertical axis represents GPU tokens per second (TPS) throughput in onemegawatt (MW) AI factory, while the horizontal axis quantifies user interactivity andresponsiveness as TPS for a single user. At the optimal intersection of throughputand responsiveness, HGX B300 yields a 30x overall increase in AI factory outputperformance compared to the Hopper architecture for maximum token revenue. Next-Level AI Training Performance The HGX B300 platform delivers up to 2.6x higher training performance for largelanguage models such as DeepSeek-R1. Wi

点击免费查看完整报告

你可能感兴趣

NVIDIA Blackwell Ultra 数据手册

核心观点与关键数据

技术突破与性能提升

研究结论

你可能感兴趣

国君电子|Blackwell Ultra架构更新,CPO赋能下一代AI计算

电子行业点评：Blackwell Ultra发布，加速训练和测试时扩展推理

数字经济周报：NVIDIA 推出 BlueField-4 数据处理器

计算机行业周报：酷睿Ultra发布，助力AIPC时代推进

人形机器人行业快评报告：北京举行人形机器人马拉松，天工Ultra夺冠