[Paper] Fantasy: Efficient Large-scale Vector Search on GPU Clusters with GPUDirect Async
Vector similarity search has become a critical component in AI-driven applications such as large language models (LLMs). To achieve high recall and low latency,...