카테고리 없음
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods 논문리뷰
jinuklee
2025. 5. 20. 22:19
https://arxiv.org/html/2502.01618v3#S3
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods
We now zoom in on how PF scales with inference-time compute. Figure 2 shows the change of performance (in terms of accuracy) with an increasing computation budget (N=1,2,4,8,16,32,64,128𝑁1248163264128N=1,2,4,8,16,32,64,128italic_N = 1 , 2 , 4 , 8 , 16
arxiv.org
요약
sequential monte carlo = particle filtering
transition model == LLM
emission model == reward model (mathshepherd, qwen-prm)
approximate joint target distribution with resampling
reward를 inverse_sigmoid를 통해 weight으로 계산한후 이를 통해 random.choice를 통해 particle filtering 을 해서 최종 particle의 reward를 통해 결과를 구함