inference-time, RLHF/Process reward model

MULTI-STEP PROBLEM SOLVING THROUGH A VERIFIER: ANEMPIRICAL ANALYSIS ON MODEL-INDUCED PROCESSSUPERVISION 논문리뷰

jinuklee 2024. 8. 29. 21:39

https://arxiv.org/pdf/2402.02658