카테고리 없음

MCTS + o1 journey

jinuklee 2024. 12. 26. 03:00

https://arxiv.org/pdf/2412.15797

Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning

 

https://arxiv.org/pdf/2501.01478

Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search

 

https://arxiv.org/abs/2501.07301

The Lessons of Developing Process Reward Models in Mathematical Reasoning

 

https://github.com/GAIR-NLP/O1-Journey

o1 journey