publications

2025

  1. NeurIPS
    ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction
    Renze Chen, Zhuofeng Wang, Beiquan Cao, Tong Wu, Size Zheng, Xiuhong Li, Xuechao Wei, Shengen Yan, Meng Li, and Yun Liang
    In Proceedings of the 38th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 2025