'2025/03/02 글 목록

2025/03/02 1

For tree-search algorithms, how to construct reliable value function and reward model is the main issue LATSmajority voting, LLM evaluation score -> value functionsimulation stage -> objective feedback == reward 실제 성공 여부로 backpropagate예시) hotpot task Alphamathwe have a value model V and a LLM policy model π , which are the same model but with different final layers in our paperpreliminarymethodI..

카테고리 없음 2025.03.02

이진욱님의 블로그

ai research memo for reference

최근글
인기글

Facebook
Twitter

Today :
Yesterday :

일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

2025/03/02 1

티스토리툴바