Speculative Decoding

An inference optimization where a smaller draft model generates candidate tokens that the larger model verifies in parallel. Accepted tokens skip the expensive generation step, significantly speeding up inference without changing output quality.