inference-time, RLHF/Process reward model

OmegaPRM - Improve Mathematical Reasoning in LanguageModels by Automated Process Supervision 논문리뷰

jinuklee 2024. 8. 23. 19:02

https://arxiv.org/pdf/2406.06592