Page 9 / 17

202 posts in total. Keep on posting.

Showing posts 97–108 of 202. Each entry opens locally on this site; legacy Hexo posts link back to their original article at the bottom for reference.

2026

05-11 中

MASPO：面向 LLM 多智能体系统的联合提示词优化

一篇关于 MASPO 的中文阅读笔记：它用 local、lookahead 与 global 三类信号联合优化 LLM 多智能体系统中的角色提示词。
05-11 EN

MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems

A detailed technical review of MASPO, a joint prompt optimization method for multi-agent LLM systems that balances local, downstream, and global rewards.
05-10 中

Tutti：让基于 SSD 的 KV Cache 真正适用于长上下文 LLM Serving

一篇关于 Tutti 的中文阅读笔记：它从 GPU-native KV cache object store、GPU io_uring 与 slack-aware scheduling 出发，让 SSD-backed KV cache 更适合长上下文 LLM serving。
05-10 EN

Tutti: Making SSD-Backed KV Cache Practical for Long-Context LLM Serving

A detailed technical review of Tutti, a GPU-centric SSD-backed KV cache system that makes long-context LLM serving cache reuse practical.
05-09 EN

Queueing Stability for LLM Inference with KV Cache Memory Constraints

A detailed technical review of a queueing-theoretic framework for predicting LLM inference stability under KV cache memory constraints.
05-08 EN

Swift-SVD: Activation-Aware Low-Rank Compression for LLM Weights and KV Cache

A detailed technical review of Swift-SVD, an activation-aware low-rank compression method for LLM weights and KV cache that uses output covariance eigendecomposition to avoid expensive generalized SVD.
05-07 EN

Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism

A detailed technical review of Piper, a resource-model-driven system for large-scale MoE training with pipelined hybrid parallelism, HALO hierarchical all-to-all, and topology-aware expert placement.
05-01 EN

Low-Rank Optimization Trajectories for LLM RLVR Acceleration: A Technical Review of NExt

A detailed technical review of NExt, a method that models low-rank optimization trajectories to accelerate reinforcement learning with verifiable rewards for large language models.
04-29 EN

FEPLB Technical Review: Nearly Free MoE Load Balancing with the NVLink Copy Engine

A detailed technical review of FEPLB, a system that uses Hopper NVLink Copy Engines to perform fine-grained MoE load balancing with little interference to normal expert-parallel training.
04-27 EN

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond — Technical Review

A technical review of agentic world modeling, covering capability levels, governing-law regimes, evaluation, and why decision-centric world models matter for LLM agents.
04-26 EN

OGER: A Robust Offline-Guided Exploration Reward for Hybrid Reinforcement Learning

A comprehensive technical review of SAGE, analyzing how to optimize semantic evidence composition for edge-cloud systems under hard uplink budget constraints. The paper challenges importance-only patch selection and proposes a training-free method combining importance filtering with diversity-maximizing sampling.
04-24 EN

FEPLB: Zero-Cost MoE Load Balancing via NVLink Copy Engine

How to reduce MoE token imbalance from 18.6% GPU waste to 51-70% improvement using hardware that was previously idle.