Skip to content

Preference Optimization

A family of training techniques that align language models using examples of preferred versus dispreferred outputs. RLHF and DPO are prominent methods. Preference optimization turns base language models into helpful, safe assistants.

Related terms

RLHF (Reinforcement Learning from Human Feedback)DPO (Direct Preference Optimization)Alignment
← Back to glossary