About Zhihang Yuan

Hello! I’m Zhihang Yuan. I received my Bachelor’s degree from Peking University in 2017 and my Ph.D. degree in Computer Science from Peking University in 2022. Currently, I am mainly engaged in Efficient AI, focusing on the quantization and inference acceleration of neural networks, as well as the collaborative optimization of software and hardware for deep learning.

In 2021, I joined Houmo AI, a startup specializing in AI accelerator focusing on Computing in Memory (CIM) technique. During my time there, I participated in the design of the first and second generation AI accelerators, where I was responsible for designing quantization schemes and specialized hardware acceleration units. I also led the development of quantization algorithms and tools.

我于2017年获得北京大学药学院学士学位,于2022年获得北京大学计算机学院博士学位。现在主要在做高效AI相关的研究工作,研究方向为神经网络的量化及推理加速、深度学习的软硬件协同优化。 2021年加入存算一体芯片创业公司后摩智能,参与了第一代和第二代数字存算AI加速器设计,负责神经网络压缩量化算法和量化工具的开发,负责芯片量化方案的设计及专用硬件加速单元的算法设计,并推进了多项研究成果落地。 喜欢走走,曾参加过北京大学自行车协会2014年暑期远征和北京大学山鹰社2022年英吉沙科考。


  • Yuan Z, Shang Y, et al. ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models. arXiv 2023. (co-first)
  • Shang Y, Yuan Z, et al. PB-LLM: Partially Binarized Large Language Models. ICLR 2024. (co-first)
  • Shang Y, Yuan Z, et al. MIM4DD: Mutual Information Maximization for Dataset Distillation, NeurIPS, 2023.
  • Yuan Z, Lin N, Liu J, et al. RPTQ: Reorder-based Post-training Quantization for Large Language Models. arXiv preprint arXiv:2304.01089, 2023.
  • Niu L, Liu J, Yuan Z, et al. Improving Post-Training Quantization on Object Detection with Task Loss-Guided Lp Metric. arXiv preprint arXiv:2304.09785, 2023.
  • Yuan Z, Liu J, Wu J, et al. Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance. AdvML-Frontiers 2023.
  • Shang Y, Yuan Z, Xie B, et al. Post-training Quantization on Diffusion Models. CVPR 2023. (co-first)
  • Liu J, Niu L, Yuan Z, et al. PD-Quant: Post-Training Quantization based on Prediction Difference Metric. CVPR 2023. (communication)
  • Han Y, Yuan Z, Pu Y, et al. Latency-aware Spatial-wise Dynamic Networks, NeurIPS 2022. (co-first)
  • Li X, Yuan Z, Guan Y, et al. Flatfish: a Reinforcement Learning Approach for Application-Aware Address Mapping. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2022. (co-first)
  • Li X, Bing Z, Guang Y, et al. Enabling High-Quality Uncertainty Quantification in a PIM Designed for Bayesian Neural Network. HPCA, 2022.
  • Yuan Z, Xue C, Chen Y, et al. PTQ4ViT: Post-Training Quantization Framework for Vision Transformers. European Conference on Computer Vision (ECCV), 2022. (co-first)
  • Yuan Z, Chen Y, Xue C, et al. PTQ-SL: Exploring the Sub-layerwise Post-training Quantization. arXiv preprint arXiv:2110.07809, 2021.
  • Yuan Z, Jingze L, Xingchen L, et al. NAS4RRAM: Neural Network Architecture Search for Inference on RRAM-based Accelerators. SCIENCE CHINA Information Sciences, 2021.
  • Yuan Z, Wu B, Sun G, et al. S2DNAS: Transforming Static CNN Model for Dynamic Inference via Neural Architecture Search. European Conference on Computer Vision (ECCV oral), 2020.
  • Yuan Z, Liu X, Wu B, et al. ENAS4D: Efficient Multi-stage CNN Architecture Search for Dynamic Inference. arXiv preprint, 2020.
  • Guan Y, Sun G, Yuan Z, et al. Crane: Mitigating Accelerator Under-utilization Caused by Sparsity Irregularities in CNNs. IEEE Transactions on Computers (TC), 2020.
  • Guan Y, Yuan Z, Sun G, et al. FPGA-based accelerator for long short-term memory recurrent neural networks. Asia and South Pacific Design Automation Conference (ASP-DAC), 2017.
  • Wu B, Liu Z, Yuan Z, et al. Reducing overfitting in deep convolutional neural networks using redundancy regularizer. International Conference on Artificial Neural Networks (ICANN), 2017.