speculative decoding | EUNO.NEWS

1 month ago · ai

AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders

Introduction AdaSPEC is a new method that speeds up large language models by using a small draft model for the initial generation pass, followed by verificatio...

#speculative decoding #knowledge distillation #large language models #inference acceleration #draft model #AdaSPEC #AI efficiency #model compression
1 month ago · ai

[Paper] DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving

Large language model (LLM) inference often suffers from high decoding latency and limited scalability across heterogeneous edge-cloud environments. Existing spe...

#speculative decoding #LLM serving #edge‑cloud inference #distributed inference #adaptive window control