분류 전체보기 251

MCTS보다 좋은(?) search algorithm if ? inference-time 에 사용된다면

AlphaZero-Style Search: An enhancement of MCTS that combines deep neural networks with tree search. This approach, popularized by AlphaZero, uses a policy network to guide the search and a value network to evaluate positions, which can outperform standard MCTS by focusing the search on more promising branches.Best-First Search (A):* Best-First Search algorithms, such as A*, prioritize expanding ..

카테고리 없음 2024.08.22

Step-Controlled DPO: Leveraging Stepwise Error forEnhanced Mathematical Reasoning 논문리뷰

https://arxiv.org/pdf/2407.00782We introduce Step-Controlled DPO (SCDPO), which we empirically show improves the performance of DPO in enhancing LLMs’ mathematical reasoning abilities. We also conduct qualitative analysis of credit assignment of SCDPO. • We conduct experiments on chain-of-thought and code-integrated solutions, showing that SCDPO can effectively improve mathematical problem-solvi..

카테고리 없음 2024.08.20