논문 원본: https://nips.cc/virtual/2024/poster/96936
NeurIPS Poster KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Abstract: LLMs are seeing growing use for applications which require large context windows, and with these large context windows KV cache activations surface as the dominant contributor to memory consumption during inference. Quantization is a promising ap
nips.cc
발표 자료:
'Paper review' 카테고리의 다른 글
| NLP 논문 정리 | 3학년 1학기 (5) | 2025.08.09 |
|---|---|
| CV 논문 정리 | 3학년 1학기 (4) | 2025.08.09 |