Zhongzhu / Charlie
Home
Research
Publication
Experience
Recent News
Blog
CV
↗
Tag
#
Distributed Training
23 posts tagged with this label. Back to
all tags
or the
main feed
.
2026
05-15
EN
Zero Sum SVD: A Global, Loss-Aware Rank Budget for LLM Compression
05-15
中
Zero Sum SVD:用「损失零和」做全局奇异值预算分配的 LLM 压缩方法
05-14
EN
DisagMoE: Disaggregating Attention and FFN to Beat the MoE All-to-All Bottleneck
05-14
中
DisagMoE:用解耦 Attention 和 FFN 打通 MoE 训练的 all-to-all 瓶颈
05-07
EN
Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism
04-29
EN
FEPLB Technical Review: Nearly Free MoE Load Balancing with the NVLink Copy Engine
04-24
EN
FEPLB: Zero-Cost MoE Load Balancing via NVLink Copy Engine
04-16
EN
PipeDream: Turning Pipeline Parallelism into a Practical Training System — Deep Technical Review
04-16
中
PipeDream:把 Pipeline Parallelism 做成真正可训练系统——深度阅读笔记
04-04
EN
Switch Transformers: Scaling to Trillion-Parameter Sparse Models — In-Depth Technical Review
04-04
中
Switch Transformers:用简单高效的稀疏性扩展到万亿参数模型 — 深度阅读笔记
04-02
EN
GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism — In-Depth Technical Review
04-02
中
GPipe:微批次流水线并行的大规模模型训练 — 深度阅读笔记
03-29
EN
Ring Attention: Blockwise Transformers for Near-Infinite Context — In-Depth Technical Review
03-26
EN
Alpa: Automating Inter- and Intra-Operator Parallelism — In-Depth Technical Review
03-19
EN
ZeRO: Shattering the Memory Wall — How DeepSpeed Trains Trillion-Parameter Models
03-12
EN
Megatron-LM: NVIDIA's Blueprint for Training Billion-Parameter Models at Scale
03-12
EN
PaRO: Smarter Partitioning for Distributed Training — Beyond ZeRO's One-Size-Fits-All
2020
09-25
EN
Slurm-Day5
09-09
EN
Slurm-Day4
09-05
EN
Slurm-Day2
09-05
EN
Slurm-Day3
09-04
EN
Slurm-Day1