카테고리 없음

A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods 논문리뷰

jinuklee 2025. 5. 20. 22:19

https://arxiv.org/html/2502.01618v3#S3

 

A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods

We now zoom in on how PF scales with inference-time compute. Figure 2 shows the change of performance (in terms of accuracy) with an increasing computation budget (N=1,2,4,8,16,32,64,128𝑁1248163264128N=1,2,4,8,16,32,64,128italic_N = 1 , 2 , 4 , 8 , 16

arxiv.org

요약

sequential monte carlo = particle filtering

transition model == LLM

emission model == reward model (mathshepherd, qwen-prm)

approximate joint target distribution with resampling

reward를 inverse_sigmoid를 통해 weight으로 계산한후 이를 통해 random.choice를 통해 particle filtering 을 해서 최종 particle의 reward를 통해 결과를 구함