V-star: Training verifiers for self-taught reasoners 논문리뷰
https://arxiv.org/abs/2402.06457Common self-improvement approaches for large language models (LLMs), such as STaR, iteratively fine-tune LLMs on self-generated solutions to improve their problem-solving ability. However, these approaches discard the large amounts of incorrect solutions generated during this process, potentially neglecting valuable information in such solutions. To address this s..