Zhongzhu Zhou / Charlie Zhou
Senior Research Scientist [Incoming] / Research Consultant
Turbo Team
🏢 Office: Room 408, J12/1 Cleveland St, Darlington NSW 2008
📮 Email: zhongzhu.zhou [at] sydney.edu.au, zhouzhzh8 [at] mail2.sysu.edu.cn
Senior Research Scientist [Incoming] / Research Consultant
Turbo Team
🏢 Office: Room 408, J12/1 Cleveland St, Darlington NSW 2008
📮 Email: zhongzhu.zhou [at] sydney.edu.au, zhouzhzh8 [at] mail2.sysu.edu.cn
"Let everything happen to you. Beauty and terror. Just keep going. No feeling is final." - Rainer Maria Rilke
I am a Senior Research Scientist [Incoming] / Research Consultant at the Turbo Team, Together AI, supervised by Ben Athiwaratkun.
I am a Ph.D. candidate at the School of Computer Science, Faculty of Engineering, The University of Sydney, supervised by Prof. Shuaiwen Song.
Prior to my current position, I have been fortunate to intern at Dolby, DeepSpeed Microsoft, Weixin Group Tencent, Microsoft (China), contributing to projects in building machine learning system. I was also a research assistant at the School of Computer Science and Engineering, Sun Yat-sen University from 2019 to 2022, under the supervision of Prof. Dan Huang, Yunfei Du, and Yutong Lu. I received my B.E. degree from the School of Computer Science and Engineering, Sun Yat-sen University in 2019. My research is mainly supported by the Together Computer.
My research focuses on multiple aspects of the efficient machine learning system stack, including the neural network algorithms & architecture design, and training & inference system optimization. Specifically, I aim to enhance and bridge the gap between emerging machine learning algorithms & applications and efficient heterogeneous hardware (CPU/GPU) training & inference systems in terms of productivity and performance.
Feel free to drop me an email if you have aligned interests. Currently, I am working on the following projects: (🔥 indicate the projects I am leading)
Efficient LLM Track:
LLM's Reasoning Steering
Coding Agent Training
Multi-Latent Attention Application on General Transformer Model 🔥
RL training service analyze and build 🔥
Efficient Machine Learning System Track:
Efficient Mobile Video Compress & Streaming System Design 🔥
Pre-Expedite: Use Hierarchical Structure Space for Improving the Performance of Accessing Small Files in Parallel File System 🔥
Hybrid-Share: Universal Resource Scheduling for Hybrid Jobs on Supercomputers 🔥
EmReal: A Digital Twin Framework of Emulated and Real Components for Robots with Reinforcement Learning 🔥
Together AI Hybrid & San Francisco United States
Research Consultant May. 2024 - Present
Dolby Sydney, Australia
Research Intern May. 2024 - Sep.2024
DeepSpeed Team, Microsoft Sydney, Australia
Research Intern Mar. 2023 - Feb. 2024
Weixin Group, Tencent Holdings Ltd. Champaign, IL, US & Guangzhou, China
Research Intern Jul. 2018 - Jul. 2020
Microsoft(China) Co.,Ltd. Guangzhou, China
Project Assistant to Senior Cloud Acrchitect Sep. 2018 - Feb. 2019
The University of Sydney (USYD) Sydney, Australia
Doctor of Philosophy (Ph.D.) Oct. 2022 - Feb. 2026
Accumulated GPA: 4.0/4.0
Honor and Awards:
Progress Evaluation: satisfactory or excellent, USYD, 2023, 2024, 2025
APR Intern Program Scholarship (SC3600), USYD, 2024
The Jingdong Technology (JD) Co Ltd Research Scholarship in Artificial intelligence, USYD, 2022
Sun Yat-sen University (SYSU) Guangzhou, China
Reserach Associate Sep. 2019 - Jun. 2022
Accumulated GPA: 3.41/4.0
Honor and Awards:
SYSU Overseas Visiting and Collaborative Research Program Funding Plan, SYSU, 2021
The Third Class Scholarship, SYSU X 3, 2020, 2021, 2022
The Second Class Scholarship (Top 15% of the major), SYSU, 2019
University of Illinois Urbana-Champaign (UIUC) Remotely & Champaign, IL, US
Summer Seission Student Jun. 2018 - Sep. 2018
Honor and Awards:
Illinois Computer Science Summer Research Program, UIUC, 2018
Sun Yat-sen University (SYSU) Guangzhou, China
Bachelor of Engineering in Computer Science and Technology Sep. 2015 - Jun. 2019
Accumulated GPA: 3.9/4.0
Honor and Awards:
National Scholarship (Top 1 of the major), China, 2016
Research Honor Degree, SYSU, 2019
The First Class Scholarship (Top 5% of the major) X 2, SYSU, 2015-2016, 2017-2018
The Second Class Scholarship (Top 15% of the major), SYSU, 2016-2017
Meritorious Winner, COMAP’s Mathematical Contest in Modeling, United States, 2017
The Second Prize, The Chinese Mathematics Competitions, 2016
The Third Prize, The Chinese Mathematics Competitions, 2017
The Third Prize, ACM-ICPC, SYSU, 2017
The Second Prize, Student Innovation Software Development Competition, SYSU, 2017
The Third Prize, Microsoft Hackthon, South China, 2018
Institute of Electrical and Electronics Engineers (IEEE) Member ID: 97841404
Association for Computing Machinery (ACM) Member ID: 6708618
China Computer Federation (CCF) Member ID: B8293G
Reviewer for Conferences
Thirty-ninth Conference on Neural Information Processing Systems (NeurIPS 2025)
Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023)
Program Committee
Computer Science Research Methods (CSRM 2023) (INFO5993/ INFO4990 in University of Sydney)
ACM International Conference on Architectural Support for Programming Languages and Operating Systems Artifact Evaluation Committee (ASPLOS’24 AEC)
Web Chair
32nd IEEE International Symposium on High-Performance Computer Architecture (HPCA 2026)
C Language Programming in Chinese
Xuemao Zhou, Wei Yi, Zhongzhu Zhou
Tianjin University Press
ISBN: 9787561847251
JSidentify: A Hybrid Framework for Detecting Plagiarism Among JavaScript Code in Online Mini Games
Qun Xia, Zhongzhu Zhou, Zhihao Li, Bin Xu, Wei Zou, Zishun Chen, Huafeng Ma, Gangqiang Liang, Haochuan Lu, Shiyu Guo, Ting Xiong, Yuetang Deng, Tao Xie
ICSE (International Conference on Software Engineering) 2020
Track: Software Engineering in Practice
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
Haojun Xia, Zhen Zheng, Yuchao Li, Donglin Zhuang, Zhongzhu Zhou, Xiafei Qiu, Yong Li, Wei Lin, Shuaiwen Leon Song
VLDB (International Conference on Very Large Databases) 2024
Track: Research
Imitate Optimal Policy: Prevail and Induce Action Collapse in Policy Gradient
Zhongzhu Zhou, Yibo Yang, Ziyan Chen, Fengxiang Bie, Haojun Xia, Xiaoxia Wu, Robert Wu, Ben Athiwaratkun, Bernard Ghanem, Shuaiwen Leon Song
ICLR (The Fourteenth International Conference on Learning Representations)2026
Track: Research
CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning
Yibo Yang, Xiaojie Li, Zhongzhu Zhou, Shuaiwen Leon Song, Jianlong Wu, Liqiang Nie, Bernard Ghanem
NeurIPS (Annual Conference on Neural Information Processing Systems)2024
Track: Research
Quant-LLM: Accelerating the Serving of Large Language Models via FP6-Centric Algorithm-System Co-Design on Modern GPU
Haojun Xia, Zhen Zheng, Xiaoxia Wu, Shiyang Chen, Zhewei Yao, Stephen Youn, Arash Bakhtiari, Michael Wyatt, Donglin Zhuang, Zhongzhu Zhou, Olatunji Ruwase, Yuxiong He, Shuaiwen Leon Song
ATC (USENIX Annual Technical Conference) 2024
Track: Research
Ladder-Residual: Parallelism-Aware Architecture for Accelerating Large Model Inference with Communication Overlapping
Muru Zhang, Mayank Mishra, Zhongzhu Zhou, William Brandon, Jue WANG, Yoon Kim, Jonathan Ragan-Kelley, Shuaiwen Leon Song, Ben Athiwaratkun, Tri Dao
ICML Forty-second International Conference on Machine Learning 2025
Track: Research
Understanding and Steering the Cognitive Behaviors of Reasoning Models at Test-Time
ICLR (The Fourteenth International Conference on Learning Representations)2026
Track: Research
CARE: Covariance-Aware and Rank-Enhanced Decomposition for Enabling Multi-Head Latent Attention
ICLR (The Fourteenth International Conference on Learning Representations)2026
Track: Research
KITTY: ACCURATE AND EFFICIENT 2-BIT KV CACHE QUANTIZATH CHANNEL-WISE PRECISION BOOST
MLSys (Ninth Annual Conference on Machine Learning and Systems)2026
Track: Research
Binary Neural Network for Automated Visual Surface Defect Detection
Wenzhe Liu, Jiehua Zhang, Zhou Su, Zhongzhu Zhou, Li Liu
Sensors MDPI (Multidisciplinary Digital Publishing Institute)
Track: Special Issue Intelligent Sensing and Monitoring for Industrial Process
RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model
Fengxiang Bie, Yibo Yang, Zhongzhu Zhou, Adam Ghanem, Minjia Zhang, Zhewei Yao, Xiaoxia Wu, Connor Holmes, Pareesa Golnari, David A. Clifton, Yuxiong He, Dacheng Tao, Shuaiwen Leon Song
PAMI (IEEE Transactions on Pattern Analysis and Machine Intelligence)
Track: Survey Papers
DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
Zhewei Yao, Reza Yazdani Aminabadi, Olatunji Ruwase, Samyam Rajbhandari, Xiaoxia Wu, Ammar Ahmad Awan, Jeff Rasley, Minjia Zhang, Conglong Li, Connor Holmes, Zhongzhu Zhou, Michael Wyatt, Molly Smith, Lev Kurilenko, Heyang Qin, Masahiro Tanaka, Shuai Che, Shuaiwen Leon Song, Yuxiong He
DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies
Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Shiyang Chen, Chengming Zhang, Masahiro Tanaka, Xiaoxia Wu, Jeff Rasley, Ammar Ahmad Awan, Connor Holmes, Martin Cai, Adam Ghanem, Zhongzhu Zhou, et al.
KUBERNETES 用户态应用中基于虚拟文件系统的小文件存储优化系统
Liang Du, Guixin Guo, Kangyou Zhong, Yunfei Du, Yutong Lu, Zhongzhu Zhou
Chinese Patent
申请号: CN202010195318.5, 公开号: CN111475469A/B
International Conference on Learning Representations (ICLR 2025) In-person Attendance; April 24 - 28, 2025
NVIDIA GPU Technology Conference (GTC 2025) In-person Attendance; March 16–21, 2025
USENIX Annual Technical Conference (ATC 2024) Online Attendance; July 10 - 12, 2024
ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2023) Online Attendance; March 25 - 29, 2023
International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2022) Online Attendance; November 13 - 18, 2021
China National Computer Congress (CNCC 2021) Online Attendance; December 16 - 18, 2021
International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2021) Online Attendance; November 14 - 19, 2021
ACM International Conference on Supercomputing (ISC 2021) Online Attendance; June 24 - July 02, 2021
International Symposium on Computer Architecture (ISCA 2021) Online Attendance; June 14 - 19, 2021
International Conference on High Performance Big Data and Intelligent Systems (HPBC&IS 2020) Online Attendance; May 23 - 23, 2020
Pearson Test of English: Listening: 61; Reading: 71; Writing: 64; Speaking: 75; Overall: 67
March, 21, 2024
Programming Languages: Pascal (11 yrs), C (11 yrs), C++ (11 yrs), Python (6 yrs), HTML, CSS, JavaScript (6 yrs), Java (6 yrs), SQL (6 yrs), Bash (6 yrs), LaTeX (5 yrs), Matlab (5 yrs), CUDA (5 yrs), R (4 yrs), Go (4 yrs), Triton (2 yrs)
Systems and Infrastructure: MPI/OpenMPI (6 yrs), Linux Kernel (5 yrs), Distributed/Parallel File Systems - e.g., Lustre, HDFS (5 yrs), Kubernetes, Kubernetes Scheduler, Kubernetes SR-IOV (5 yrs), Docker (5 yrs), Hadoop (4 yrs), Spark (4 yrs), YARN (4 yrs), Mesos (4 yrs), NVLink (2 yrs), NVshMem (2 yrs), TensorCore, CudaCore Programming (2 yrs)
Machine Learning and AI: TensorFlow (5 yrs), PyTorch (4 yrs), TorchServe (4 yrs), TensorBoard (4 yrs), Ray (4 yrs), JAX (2 yrs), Triton (2 yrs), DeepSpeed (1 yr), HuggingFace (1 yr), Reinforcement Learning (4 yrs), CNN, RNN, ResNet, Attention Block, UNet, Transformer, ViT (5 yrs), Neural Architecture Search (3 yrs), Diffusion Models (1 yr), GPT-2,3,4 (1 yr), Reinforcement learning from human feedback (1 yr), VeOmni (1 yr), TorchTitan (1 yr), FSDP (1 yr), Zero 1,2,3 (1 yr)
Databases and Storage: MySQL (6 yrs), Oracle SQL (6 yrs), MongoDB (6 yrs), PostgreSQL (6 yrs), Redis (4 yrs), Hive SQL (3 yrs)
Front-end Development: PHP (6 yrs), Vue.js (6 yrs), ReactJS (6 yrs), ASP.NET (6 yrs), jQuery (6 yrs), AngularJS (6 yrs), Apache (6 yrs), MeteorJS (6 yrs)
Back-end Development: Spring Boot (6 yrs), Django (6 yrs), Flask (6 yrs), Node.js (6 yrs), Express (6 yrs), REST API Design (6 yrs), CI/CD
Mobile Development: Android Studio (6 yrs, Java, Kotlin), XCode (6 yrs, Swift, Objective-C), React Native (6 yrs, Cross-platform), Flutter (6 yrs, Cross-platform)
Web Crawling & Testing: Urllib (6 yrs), BeautifulSoup (6 yrs), Scrapy (6 yrs), Requests (6 yrs), JSON (6 yrs), Selenium (6 yrs), Pytest (6 yrs), JUnit (6 yrs)
Version Control & Build Systems: Git (8 yrs), Gradle (6 yrs, Android, Java), Maven (6 yrs, Java), npm (6 yrs, JavaScript), pip (6 yrs, Python)
Development Tools & Libraries: Airflow (2 yrs), Kafka (2 yrs), Elasticsearch (2 yrs), OpenCV (5 yrs), Pandas (5 yrs), NumPy (5 yrs), SciPy (5 yrs), NLTK (5 yrs), Matplotlib (5 yrs), Seaborn (5 yrs), Azure Data Factory (2 yrs), AWS (2 yrs), Google Cloud Platform (2 yrs)
Fitness: Fencing(6 yrs), Jogging (7 yrs), Bodybuilding(6 yrs) (Hongxing Fitness Club Outstanding Students), Table Tennis(11 yrs), Badminton(11 yrs)
Leisure: Web & Mobile Application Development(5 yrs), Saxophone(9 yrs), Magic (1 yrs), Video Games (more than 500+ PS5 game collections)
Volunteer:
Sun Yat-sen University, School of Computer Science and Engineering, Student Union,Vice President, Jul, 2016 - Jul, 2017
Mentored incoming freshmen, helped them acclimate to the university environment, and promoted a sense of belonging through inclusive campus activities and events.
Changjun High School Volunteer, Jul, 2013 - Jul, 2014
Enhanced the nursing home experience by engaging in meaningful conversations with elderly residents, preparing and serving fresh fruit, and maintaining a clean and sanitary environment for their well-being.