
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D. Manning, Stefano Ermon, Chelsea Finn
00
2023-02-27
alignmentoptimization
Abstract
This paper introduces and evaluates the idea described in “Direct Preference Optimization: Your Language Model is Secretly a Reward Model”, and reports empirical results that helped shape subsequent work in alignment, optimization.