inference-time, RLHF/search (multimodal)

VisVM : Scaling Inference-Time Search with Vision Value Modelfor Improved Visual Comprehension

jinuklee 2025. 1. 24. 18:41

https://arxiv.org/pdf/2412.03704v2