inference-time, RLHF/STaR, ResT - LMM

TLDR: Token-Level Detective Reward Model forLarge Vision Language Models 논문리뷰

jinuklee 2024. 10. 12. 18:17

https://arxiv.org/pdf/2410.04734