About Chiyue Wei

Chiyue Wei is a Ph.D. student in Electrical and Computer Engineering at Duke University, working under the supervision of Professor Yiran Chen. His research interests lie at the intersection of computer architecture and deep learning. Prior to Duke, he earned his Bachelor’s degree in Electronic Engineering from Tsinghua University in 2023, where he conducted research with Professor Yuan Xie and Professor Yu Wang.

🔥 News

[2025/11] 🎉🎉 Two papers accepted to HPCA 2026! One of them features our work Focus: A Streaming Concentration Architecture for Efficient Vision-Language Models. Paper and code will be released soon.
[2025/08] Check out our work DPad, a training-free acceleration method for Diffusion LLMs, now available on on arXiv!
[2025/08] Wrapped up my internship at NVIDIA, where I worked on the FlashInfer project. I developed high-performance and customizable attention kernels with CuTe DSL, optimized for Blackwell GPUs.
[2025/06] Honored to be named a DAC 2025 Young Fellow.
[2025/06] Excited that our works Phi, Transitive Array, and Ecco were presented at ISCA 2025, check out the slides for Phi.
[2025/05] 🎉 I’m excited to start my summer internship at NVIDIA, focusing on LLM inference framework optimization within the Deep Learning Frameworks team.
[2025/03] 🎉🎉🎉 Three papers accepted by ISCA 2025! Topics include acceleration for Spiking Neural Networks, General Matrix Multiplications, and Large Language Models.
[2025/03] 🔥 I presents Prosperity at HPCA 2025 in Las Vegas! Check out presentation slides and video.
[2024/11] 🎉 Our paper “Prosperity: Accelerating Spiking Neural Networks via Product Sparsity” is accepted by HPCA 2025.

📝 Selected Publications

HPCA 2026 Focus: A Streaming Concentration Architecture for Efficient Vision-Language Models, Chiyue Wei, Cong Guo, Junyao Zhang, Haoxuan Shan, Yifan Xu, Ziyue Zhang, Yudong Liu, Qinsi Wang, Changchun Zhou, Hai “Helen” Li, Yiran Chen
ISCA 2025 Phi: Leveraging Pattern-based Hierarchical Sparsity for High-Efficiency Spiking Neural Networks, Chiyue Wei, Bowen Duan, Cong Guo, Jingyang Zhang, Qingyue Song, Hai Li, Yiran Chen
ISCA 2025 Transitive Array: An Efficient GEMM Accelerator with Result Reuse, Cong Guo, Chiyue Wei, Jiaming Tang, Bowen Duan, Song Han, Hai Li, Yiran Chen
ISCA 2025 Ecco: Improving Memory Bandwidth and Capacity for LLMs via Entropy-Aware Cache Compression, Feng Cheng, Cong Guo, Chiyue Wei, Junyao Zhang, Changchun Zhou, Edward Hanson, Jiaqi Zhang, Xiaoxiao Liu, Hai Li, Yiran Chen
HPCA 2025 Prosperity: Accelerating Spiking Neural Networks via Product Sparsity, Chiyue Wei, Cong Guo, Feng Cheng, Shiyu Li, Hao Yang, Hai Li, Yiran Chen
DATE 2023 CLAP: Locality Aware and Parallel Triangle Counting with Content Addressable Memory, Tianyu Fu, Chiyue Wei, Zhenhua Zhu, Shang Yang, Zhongming Yu, Guohao Dai, Huazhong Yang, Yu Wang
ISCA 2022 DIMMining: Pruning-Efficient and Parallel Graph Mining on Near-Memory-Computing, Guohao Dai, Zhenhua Zhu, Tianyu Fu, Chiyue Wei, Bangyan Wang, Xiangyu Li, Yuan Xie, Huazhong Yang, Yu Wang