AI智能总结
段石石-壁仞科技-技术专家 DataFunSummit#2023 目录CONTENT 历史背景分布式训练技术介绍 未来挑战 分布式训练挑战 01历史背景 Large Language Models LLMInfra 分布式训练挑战 DataFunSummit#2023 LLM needHuge FLOPS Transformer FLOPs Equation: C=�T≈6ND N:the number of parameters;D: the number of tokens that model is train on; Model Parameters vs. Memory Gap become huge Gap become huge Distributed ML System A100峰值算力 2.53*10^24/312*10^12/86400=254year PaLM所需算力 T≈6ND/(�*#GPU*R) 2048 GPUs: 46/R days https://www.bilibili.com/video/BV1iu4y1Z7bv/?vd_source=8d00c2c0cdbe325ba3b959e4aea901ea Graph & Device Cluster OptimizerWeight=F(Weight, Grad) 分布式训练技术体系 DataFunSummit#2023 Brief Histroy •Compute Graph and Placement •Large language model wit FSL;•PaLM:PathWay@Google;•CLIP@OpenAI,连接图与文; •DistBelief;•Parameter Server[limu];•Bosen;•GeePS; ZeRO-DP Data Parallelism Family Recompute Offload Memory/NVME Synchronous Pipeline Parallelism Family Synchronous Pipeline Parallelism Family Synchronous Pipeline Parallelism Family--DAPPLE Synchronous Pipeline Parallelism Family--Interleaved 1F1B Synchronous Pipeline Parallelism Family-Meark Synchronous Pipeline Parallelism Family-Chimera Asynchronous Pipeline Parallelism Family-Pipedream Tensor Parallelism Family Tensor Parallelism Family 04未来挑战 未来挑战 •Profiler For Scheduler/Human;•Auto parallel;•Other Model module?; 感谢观看