Page 9 / 17
202 posts in total. Keep on posting.
Showing posts 97–108 of 202. Each entry opens locally on this site; legacy Hexo posts link back to their original article at the bottom for reference.
2026
- 中
MASPO:面向 LLM 多智能体系统的联合提示词优化
一篇关于 MASPO 的中文阅读笔记:它用 local、lookahead 与 global 三类信号联合优化 LLM 多智能体系统中的角色提示词。
- EN
MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems
A detailed technical review of MASPO, a joint prompt optimization method for multi-agent LLM systems that balances local, downstream, and global rewards.
- 中
Tutti:让基于 SSD 的 KV Cache 真正适用于长上下文 LLM Serving
一篇关于 Tutti 的中文阅读笔记:它从 GPU-native KV cache object store、GPU io_uring 与 slack-aware scheduling 出发,让 SSD-backed KV cache 更适合长上下文 LLM serving。
- EN
Tutti: Making SSD-Backed KV Cache Practical for Long-Context LLM Serving
A detailed technical review of Tutti, a GPU-centric SSD-backed KV cache system that makes long-context LLM serving cache reuse practical.
- EN
Queueing Stability for LLM Inference with KV Cache Memory Constraints
A detailed technical review of a queueing-theoretic framework for predicting LLM inference stability under KV cache memory constraints.
- EN
Swift-SVD: Activation-Aware Low-Rank Compression for LLM Weights and KV Cache
A detailed technical review of Swift-SVD, an activation-aware low-rank compression method for LLM weights and KV cache that uses output covariance eigendecomposition to avoid expensive generalized SVD.
- EN
Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism
A detailed technical review of Piper, a resource-model-driven system for large-scale MoE training with pipelined hybrid parallelism, HALO hierarchical all-to-all, and topology-aware expert placement.
- EN
Low-Rank Optimization Trajectories for LLM RLVR Acceleration: A Technical Review of NExt
A detailed technical review of NExt, a method that models low-rank optimization trajectories to accelerate reinforcement learning with verifiable rewards for large language models.
- EN
FEPLB Technical Review: Nearly Free MoE Load Balancing with the NVLink Copy Engine
A detailed technical review of FEPLB, a system that uses Hopper NVLink Copy Engines to perform fine-grained MoE load balancing with little interference to normal expert-parallel training.
- EN
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond — Technical Review
A technical review of agentic world modeling, covering capability levels, governing-law regimes, evaluation, and why decision-centric world models matter for LLM agents.
- EN
OGER: A Robust Offline-Guided Exploration Reward for Hybrid Reinforcement Learning
A comprehensive technical review of SAGE, analyzing how to optimize semantic evidence composition for edge-cloud systems under hard uplink budget constraints. The paper challenges importance-only patch selection and proposes a training-free method combining importance filtering with diversity-maximizing sampling.
- EN
FEPLB: Zero-Cost MoE Load Balancing via NVLink Copy Engine
How to reduce MoE token imbalance from 18.6% GPU waste to 51-70% improvement using hardware that was previously idle.