Zhongzhu / Charlie

Home Research Publication Experience Recent News Blog CV ↗

Zhongzhu / Charlie Zhou

Keep

200 Posts 25 Tags

© 2019 - 2026 Zhongzhu Zhou

Tag

#Transformer

9 posts tagged with this label. Back to all tags or the main feed.

2026

06-12 EN

SliceGPT: Post-Training LLM Compression via Computational Invariance
06-12 中

SliceGPT 阅读笔记：用计算不变性删除 Transformer 的行与列
06-11 EN

MegaScale: Engineering 55% MFU at 12,288 GPUs for LLM Training
06-11 中

MegaScale：ByteDance 如何在 12,288 块 GPU 上实现 55% MFU 的大规模 LLM 训练
04-22 EN

SAGE: Training-Free Semantic Evidence Composition for Edge-Cloud Inference Under Hard Uplink Budgets
03-28 EN

Mamba: Linear-Time Sequence Modeling with Selective State Spaces — In-Depth Technical Review
03-22 EN

Attention Is All You Need: The Transformer — In-Depth Technical Review
02-18 EN

DeepSeek-V2: Multi-head Latent Attention and DeepSeekMoE — Technical Review
02-18 EN

GLM-5 Technical Review: From Vibe Coding to Agentic Engineering

Zhongzhu Zhou / Charlie Zhou

Efficient machine learning, systems and research notes.

© 2019 - 2026 Zhongzhu Zhou · All rights reserved.

Where readers visit from

Visitor map