inference-time, RLHF/STaR, ReST

Recursive intropspection 논문 리뷰 (Teaching LanguageModel Agents How to Self-Improve)

jinuklee 2024. 7. 28. 13:52

https://arxiv.org/pdf/2407.18219v1

RISE poses fine-tuning for a single-turn prompt as solving a multi-turn Markov decision process (MDP)

SINGLE