Page 8 / 10
116 posts in total. Keep on posting.
Showing posts 85–96 of 116. Each entry opens locally on this site; legacy Hexo posts link back to their original article at the bottom for reference.
2026
- EN
Direct Preference Optimization: Your Language Model Is Secretly a Reward Model — Technical Review
A detailed technical review of Rafailov et al.'s paper 'Direct Preference Optimization', analyzing how DPO eliminates the need for reinforcement learning in language model alignment by deriving a closed-form mapping from reward functions to optimal policies, enabling a simple classification loss to replace the complex RLHF pipeline.
- EN
Tree of Thoughts: Deliberate Problem Solving with Large Language Models — Technical Review
A detailed technical review of the Tree of Thoughts (ToT) framework, which generalizes chain-of-thought prompting to enable deliberate, search-based problem solving with large language models using BFS and DFS over structured reasoning trees.
- EN
ReAct Technical Review: From Reasoning Ability to Executable Reasoning
A comprehensive technical review of ReAct (Reasoning + Acting), analyzing how interleaving chain-of-thought reasoning with tool-use actions enables LLM agents to tackle complex tasks like question answering and web navigation with improved accuracy and interpretability.
2023
- EN
ComputerArchitecture-Day1
Notes on computer architecture fundamentals — covering CPU design, instruction sets, pipelining, and memory hierarchy basics.
2022
- EN
Reinforcement Learning-Principle-Day12
Reinforcement learning study notes — hierarchical RL, options framework, and goal-conditioned policies.
- 中
极路由S1-无官方破解路径下保姆级教程,辛酸刷机历程
极路由1S (5661A) 刷机保姆级教程——在没有官方破解路径的情况下,从 breed 引导刷入到 OpenWrt 的完整折腾记录。
2021
- 中
现代操作系统原理与实现-陈海波-Day 1
陈海波《现代操作系统》学习笔记——从系统调用、文件系统到内核模块,理解操作系统各组件如何协同工作。
- EN
Intel mac to M1 chip mac
A practical guide to migrating from Intel Mac to Apple M1 Silicon, covering compatibility issues, setup tips, and performance differences.
- EN
Reinforcement Learning-Principle-Day11
Reinforcement learning study notes — reward shaping, inverse RL, and imitation learning techniques.
- EN
Reinforcement Learning-Principle-Day10
Reinforcement learning study notes — exploration strategies: epsilon-greedy, UCB, Thompson sampling, and curiosity-driven methods.
- EN
MetaLearning-Standford-Lecture5
Stanford CS 330 Meta-Learning lecture notes — covering Bayesian meta-learning and neural processes for uncertainty-aware few-shot prediction.
- EN
Reinforcement Learning-Principle-Day9
Reinforcement learning study notes — multi-agent reinforcement learning, cooperative and competitive settings.