About Zhihang Yuan

I obtained my bachelor’s degree from Peking University in 2017, and my PhD degree from the School of Computer Science at Peking University in 2022, under the supervision of Professor Guangyu Sun. Currently, I am conducting research related to Efficient AI at Bytedance Seed. My research focuses on the design of efficient neural networks architecture, reinforcement learning on LLM, and neural network compression algorithm.

袁之航于2017年获得北京大学学士学位，于2022年获得北京大学计算机学院博士学位，博士导师为孙广宇老师。目前他在字节跳动Seed团队从事高效人工智能（Efficient AI）的工作，方向为神经网络的压缩算法、大模型强化学习、高效神经网络设计等。喜欢走走，曾参加过北京大学自行车协会2014年暑期远征和北京大学山鹰社2022年英吉沙科考。

Publications

Please visit Google Scholar page for more detailed information.

* indicates equal contribution, + indicates communication author

Li A*, Wang Y*, Yuan Z*, et al. LANPO: Bootstrapping Language and Numerical Feedback for Reinforcement Learning in LLMs[J]. arXiv preprint, 2025.
Guo Y*, Wang W*, Yuan Z*, et al. SplitMeanFlow: Interval Splitting Consistency in Few-Step Generative Modeling[J]. arXiv preprint, 2025.
Zhang H*, Su R*, Yuan Z+, et al. DiTFastAttnV2: Head-wise Attention Compression for Multi-Modality Diffusion Transformers, ICCV 2025.
Yuan Z*, Xie R*, Shang Y, et al. VGDFR: Diffusion-based Video Generation with Dynamic Latent Frame Rate, ICCV 2025.
Hu X, Chen Z, Yang D, et al. MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance[J]. ICML 2025.
Duanmu H, Li X, Yuan Z, et al. MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design[J]. ICML 2025.
Zhou S*, Yuan Z*, Yang D, et al. PillarHist: A Quantization-aware Pillar Feature Encoder based on Height-aware Histogram, CVPR 2025.
Wang K, Shi M, Zhou Y, et al. A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training, CVPR 2025.
Yuan Z*, Wang S*, Shang Y, et al. DLFR-VAE: Dynamic Latent Frame Rate VAE for Efficient Video Generation, ACM MM, 2025.
Hu X*, Cheng Y*, Yang D*…, Yuan Z+. OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting, ICLR 2025.
Xu Z*, Yue Y*, Hu X, et al. MambaQuant: Quantizing the Mamba Family with Variance Aligned Rotation Methods, ICLR 2025.
Yang S, Han Z, Yuan Z*, GSE-MN4: Group-Shared Exponents Integer Quantization for MobileNetV4, ICIC 2025.
Yuan Z*, Shang Y*, Zhang H, et al. E-CAR: Efficient Continuous Autoregressive Image Generation via Multistage Modeling. arXiv preprint arXiv:2412.14170, 2024.
Yuan Z*, Lu P*, Zhang H*, et al. DiTFastAttn: Attention Compression for Diffusion Transformer Models, NeurIPS 2024.
Duanmu H*, Yuan Z*, Li X, et al. SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models. COLM 2024 (Oral).
Han Y*, Liu Z*, Yuan Z*, et al. Latency-aware unified dynamic networks for efficient image recognition. TPAMI 2024.
Yuan Z*, Shang Y*, Zhou Y*, et al. LLM Inference Unveiled: Survey and Roofline Model Insights. arXiv preprint arXiv:2402.16363, 2024.
Yue Y*, Yuan Z*, Duanmu H, et al. Wkvquant: Quantizing weight and key/value cache for large language models gains more. arXiv preprint arXiv:2402.12065, 2024.
Shang Y*, Yuan Z*, et al. PB-LLM: Partially Binarized Large Language Models. ICLR 2024.
Zhang C, Yuan Z, et al. Algorithm-hardware co-design for Energy-Efficient A/D conversion in ReRAM-based accelerators. DATE 2024.
Guo A, Chen X, Dong F, et al. 34.3 A 22nm 64kb Lightning-Like Hybrid Computing-in-Memory Macro with a Compressed Adder Tree and Analog-Storage Quantizers for Transformer and CNNs, IEEE International Solid-State Circuits Conference (ISSCC) 2024.
Yuan Z*, Shang Y*, et al. ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models. arXiv 2023.
Shang Y, Yuan Z, et al. MIM4DD: Mutual Information Maximization for Dataset Distillation, NeurIPS 2023.
Yuan Z*, Lin N*, Liu J, et al. RPTQ: Reorder-based Post-training Quantization for Large Language Models. arXiv preprint arXiv:2304.01089, 2023.
Niu L, Liu J, Yuan Z+, et al. Improving Post-Training Quantization on Object Detection with Task Loss-Guided Lp Metric. arXiv preprint arXiv:2304.09785, 2023.
Yuan Z*, Liu J*, Wu J, et al. Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance. AdvML-Frontiers 2023.
Shang Y*, Yuan Z*, Xie B, et al. Post-training Quantization on Diffusion Models. CVPR 2023.
Liu J, Niu L, Yuan Z+, et al. PD-Quant: Post-Training Quantization based on Prediction Difference Metric. CVPR 2023.
Han Y*, Yuan Z*, Pu Y, et al. Latency-aware Spatial-wise Dynamic Networks, NeurIPS 2022.
Li X*, Yuan Z*, Guan Y, et al. Flatfish: a Reinforcement Learning Approach for Application-Aware Address Mapping. TCAD 2022.
Li X, Bing Z, Guang Y, et al. Enabling High-Quality Uncertainty Quantification in a PIM Designed for Bayesian Neural Network. HPCA 2022.
Yuan Z*, Xue C*, Chen Y, et al. PTQ4ViT: Post-Training Quantization Framework for Vision Transformers. ECCV 2022.
Yuan Z, Chen Y, Xue C, et al. PTQ-SL: Exploring the Sub-layerwise Post-training Quantization. arXiv preprint arXiv:2110.07809, 2021.
Yuan Z, Jingze L, Xingchen L, et al. NAS4RRAM: Neural Network Architecture Search for Inference on RRAM-based Accelerators. SCIENCE CHINA Information Sciences (SCIS), 2021.
Ding M, Kang Y, Yuan Z, et al. Detection of facial landmarks by a convolutional neural network in patients with oral and maxillofacial disease. International Journal of Oral and Maxillofacial Surgery, 2021, 50(11): 1443-1449.
Yuan Z*, Wu B*, Sun G, et al. S2DNAS: Transforming Static CNN Model for Dynamic Inference via Neural Architecture Search. ECCV 2020 (oral).
Yuan Z, Liu X, Wu B, et al. ENAS4D: Efficient Multi-stage CNN Architecture Search for Dynamic Inference. arXiv preprint, 2020.
Guan Y, Sun G, Yuan Z, et al. Crane: Mitigating Accelerator Under-utilization Caused by Sparsity Irregularities in CNNs. IEEE Transactions on Computers (TC), 2020.
Guan Y, Yuan Z, Sun G, et al. FPGA-based accelerator for long short-term memory recurrent neural networks. Asia and South Pacific Design Automation Conference (ASP-DAC), 2017.
Wu B, Liu Z, Yuan Z, et al. Reducing overfitting in deep convolutional neural networks using redundancy regularizer. International Conference on Artificial Neural Networks (ICANN), 2017.