Skip to content

Speculative Decoding

An inference optimization where a smaller draft model generates candidate tokens that the larger model verifies in parallel. Accepted tokens skip the expensive generation step, significantly speeding up inference without changing output quality.

Related terms

InferenceThroughputModel Distillation
← Back to glossary