Model Distillation

A technique where a smaller 'student' model is trained to replicate the behavior of a larger 'teacher' model by learning from its output probabilities. Distillation creates faster, cheaper models that retain much of the teacher's capability.