https://arxiv.org/pdf/2407.18219v1
RISE poses fine-tuning for a single-turn prompt as solving a multi-turn Markov decision process (MDP)
SINGLE
'inference-time, RLHF > STaR, ReST' 카테고리의 다른 글
ReST 논문리뷰 Reinforced Self-Training (ReST) for Language Modeling (1) | 2024.10.09 |
---|---|
ReST-MCTS 논문리뷰 (0) | 2024.08.28 |
Quiet-STaR : AI 논문리뷰 (0) | 2024.07.17 |