About Chiyue Wei
Chiyue Wei is a Ph.D. student in Electrical and Computer Engineering at Duke University, working under the supervision of Professor Yiran Chen. His research interests lie at the intersection of computer architecture and deep learning. Prior to Duke, he earned his Bachelorโs degree in Electronic Engineering from Tsinghua University in 2023, where he conducted research with Professor Yuan Xie and Professor Yu Wang.
๐ฅ News
- [2025/08] Check out our work DPad, a training-free acceleration method for Diffusion LLMs, now available on on arXiv!
- [2025/08] Wrapped up my internship at NVIDIA, where I worked on the FlashInfer project. I developed high-performance and customizable attention kernels with CuTe DSL, optimized for Blackwell GPUs.
- [2025/06] Honored to be named a DAC 2025 Young Fellow.
- [2025/06] Excited that our works Phi, Transitive Array, and Ecco were presented at ISCA 2025, check out the slides for Phi.
[2025/05] ๐ Iโm excited to start my summer internship at NVIDIA, focusing on LLM inference framework optimization within the Deep Learning Frameworks team.
- [2025/03] ๐๐๐ Three papers accepted by ISCA 2025! Topics include acceleration for Spiking Neural Networks, General Matrix Multiplications, and Large Language Models.
- [2025/03] ๐ฅ I presents Prosperity at HPCA 2025 in Las Vegas! Check out presentation slides and video.
- [2024/11] ๐ Our paper โProsperity: Accelerating Spiking Neural Networks via Product Sparsityโ is accepted by HPCA 2025.
๐ Selected Publications
ISCA 2025
Phi: Leveraging Pattern-based Hierarchical Sparsity for High-Efficiency Spiking Neural Networks, Chiyue Wei, Bowen Duan, Cong Guo, Jingyang Zhang, Qingyue Song, Hai Li, Yiran ChenISCA 2025
Transitive Array: An Efficient GEMM Accelerator with Result Reuse, Cong Guo, Chiyue Wei, Jiaming Tang, Bowen Duan, Song Han, Hai Li, Yiran ChenISCA 2025
Ecco: Improving Memory Bandwidth and Capacity for LLMs via Entropy-Aware Cache Compression, Feng Cheng, Cong Guo, Chiyue Wei, Junyao Zhang, Changchun Zhou, Edward Hanson, Jiaqi Zhang, Xiaoxiao Liu, Hai Li, Yiran ChenHPCA 2025
Prosperity: Accelerating Spiking Neural Networks via Product Sparsity, Chiyue Wei, Cong Guo, Feng Cheng, Shiyu Li, Hao Yang, Hai Li, Yiran ChenDATE 2023
CLAP: Locality Aware and Parallel Triangle Counting with Content Addressable Memory, Tianyu Fu, Chiyue Wei, Zhenhua Zhu, Shang Yang, Zhongming Yu, Guohao Dai, Huazhong Yang, Yu WangISCA 2022
DIMMining: Pruning-Efficient and Parallel Graph Mining on Near-Memory-Computing, Guohao Dai, Zhenhua Zhu, Tianyu Fu, Chiyue Wei, Bangyan Wang, Xiangyu Li, Yuan Xie, Huazhong Yang, Yu Wang