speculative decoding | EUNO.NEWS

1개월 전 · ai

AdaSPEC: 효율적인 추측 디코더를 위한 선택적 지식 증류

Introduction AdaSPEC은 초기 생성 단계에서 작은 draft model을 사용하고, 그 다음 verification을 통해 large language models의 속도를 높이는 새로운 방법입니다.

#speculative decoding #knowledge distillation #large language models #inference acceleration #draft model #AdaSPEC #AI efficiency #model compression
1개월 전 · ai

[Paper] DSD: 에지-클라우드 민첩한 대규모 모델 서빙을 위한 Distributed Speculative Decoding 솔루션

대규모 언어 모델(LLM) 추론은 종종 높은 디코딩 지연과 이질적인 엣지‑클라우드 환경 전반에 걸친 제한된 확장성으로 고통받는다. Existing spe...

#speculative decoding #LLM serving #edge‑cloud inference #distributed inference #adaptive window control