BoNBoN Alignment for Large Language Modelsand the Sweetness of Best-of-n Samplin

카테고리 없음

BoNBoN Alignment for Large Language Modelsand the Sweetness of Best-of-n Samplin

jinuklee 2024. 11. 1. 19:59

https://arxiv.org/pdf/2406.00832
This paper concerns the problem of aligning samples from large language models to human preferences using best-of-n sampling, where we draw n samples, rank them, and return the best one.

We consider two fundamental problems.

First: what is the relationship between best-of-n and approaches to alignment that train LLMs to output samples with a high expected reward (e.g., RLHF or DPO)?

To answer this, we embed both the best-of-n distribution and the sampling distributions learned by alignment procedures in a common class of tiltings of the base LLM distribution.

We then show that, within this class, best-of-n is essentially optimal in terms of the trade-off between win-rate against the base model vs KL distance from the base model.

That is, best-of-n is the best choice of alignment distribution if the goal is to maximize win rate.

However, best-of-n requires drawing n samples for each inference, a substantial cost.

To avoid this, the second problem we consider is how to fine-tune a LLM to mimic the best-ofn sampling distribution.

We derive BoNBoN Alignment to achieve this by exploiting the special structure of the best-of-n distribution.

Experiments show that BoNBoN alignment yields substantial improvements in producing a model that is preferred to the base policy while minimally affecting off-target aspects. Code is available at https://github.com/gl-ybnbxb/BoNBoN

현재글BoNBoN Alignment for Large Language Modelsand the Sweetness of Best-of-n Samplin

이진욱님의 블로그

ai research memo for reference

Today :
Yesterday :

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

이진욱님의 블로그

BoNBoN Alignment for Large Language Modelsand the Sweetness of Best-of-n Samplin

'카테고리 없음'의 다른글

티스토리툴바