Zhongzhu / Charlie
Home
Research
Publication
Experience
Recent News
Blog
CV
↗
Tag
#
Transformer
9 posts tagged with this label. Back to
all tags
or the
main feed
.
2026
06-12
EN
SliceGPT: Post-Training LLM Compression via Computational Invariance
06-12
中
SliceGPT 阅读笔记:用计算不变性删除 Transformer 的行与列
06-11
EN
MegaScale: Engineering 55% MFU at 12,288 GPUs for LLM Training
06-11
中
MegaScale:ByteDance 如何在 12,288 块 GPU 上实现 55% MFU 的大规模 LLM 训练
04-22
EN
SAGE: Training-Free Semantic Evidence Composition for Edge-Cloud Inference Under Hard Uplink Budgets
03-28
EN
Mamba: Linear-Time Sequence Modeling with Selective State Spaces — In-Depth Technical Review
03-22
EN
Attention Is All You Need: The Transformer — In-Depth Technical Review
02-18
EN
DeepSeek-V2: Multi-head Latent Attention and DeepSeekMoE — Technical Review
02-18
EN
GLM-5 Technical Review: From Vibe Coding to Agentic Engineering