speculative decoding | EUNO.NEWS

1个月前 · ai

AdaSPEC：用于高效投机解码器的选择性知识蒸馏

引言 AdaSPEC 是一种新方法，通过使用小型草稿模型进行初始生成阶段，然后进行验证，以加速大语言模型。

#speculative decoding #knowledge distillation #large language models #inference acceleration #draft model #AdaSPEC #AI efficiency #model compression
1个月前 · ai

[Paper] DSD：一种用于边缘‑云敏捷大模型服务的分布式投机解码方案

大型语言模型（LLM）推理通常面临高解码延迟以及在异构边缘‑云环境中的可扩展性受限。现有的…

#speculative decoding #LLM serving #edge‑cloud inference #distributed inference #adaptive window control