KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

논문 원본: https://nips.cc/virtual/2024/poster/96936

NeurIPS Poster KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Abstract: LLMs are seeing growing use for applications which require large context windows, and with these large context windows KV cache activations surface as the dominant contributor to memory consumption during inference. Quantization is a promising ap

nips.cc

발표 자료:

엣지컴퓨팅 7조.pdf

5.26MB

'Paper review' 카테고리의 다른 글

NLP 논문 정리 \| 3학년 1학기 (5)	2025.08.09
CV 논문 정리 \| 3학년 1학기 (4)	2025.08.09

5seoyoung

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

'Paper review' 카테고리의 다른 글

티스토리툴바

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

'Paper review' 카테고리의 다른 글

관련글

티스토리툴바