您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。[DataFunSummit2023:大模型与AIGC峰会]:大模型分布式训练的第四种境 - 发现报告

大模型分布式训练的第四种境

大模型分布式训练的第四种境

段石石-壁仞科技-技术专家 DataFunSummit#2023 目录CONTENT 历史背景分布式训练技术介绍 未来挑战 分布式训练挑战 01历史背景 Large Language Models LLMInfra 分布式训练挑战 DataFunSummit#2023 LLM needHuge FLOPS Transformer FLOPs Equation: C=�T≈6ND N:the number of parameters;D: the number of tokens that model is train on; Model Parameters vs. Memory Gap become huge Gap become huge Distributed ML System A100峰值算力 2.53*10^24/312*10^12/86400=254year PaLM所需算力 T≈6ND/(�*#GPU*R) 2048 GPUs: 46/R days https://www.bilibili.com/video/BV1iu4y1Z7bv/?vd_source=8d00c2c0cdbe325ba3b959e4aea901ea Graph & Device Cluster OptimizerWeight=F(Weight, Grad) 分布式训练技术体系 DataFunSummit#2023 Brief Histroy •Compute Graph and Placement •Large language model wit FSL;•PaLM:PathWay@Google;•CLIP@OpenAI,连接图与文; •DistBelief;•Parameter Server[limu];•Bosen;•GeePS; ZeRO-DP Data Parallelism Family Recompute Offload Memory/NVME Synchronous Pipeline Parallelism Family Synchronous Pipeline Parallelism Family Synchronous Pipeline Parallelism Family--DAPPLE Synchronous Pipeline Parallelism Family--Interleaved 1F1B Synchronous Pipeline Parallelism Family-Meark Synchronous Pipeline Parallelism Family-Chimera Asynchronous Pipeline Parallelism Family-Pipedream Tensor Parallelism Family Tensor Parallelism Family 04未来挑战 未来挑战 •Profiler For Scheduler/Human;•Auto parallel;•Other Model module?; 感谢观看