· ai
[Paper] Beluga: A CXL-Based Memory Architecture for Scalable and Efficient LLM KVCache Management
The rapid increase in LLM model sizes and the growing demand for long-context inference have made memory a critical bottleneck in GPU-accelerated serving system...