Paper review
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
오서영
2025. 11. 17. 13:47
논문 원본: https://nips.cc/virtual/2024/poster/96936
NeurIPS Poster KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Abstract: LLMs are seeing growing use for applications which require large context windows, and with these large context windows KV cache activations surface as the dominant contributor to memory consumption during inference. Quantization is a promising ap
nips.cc
발표 자료: