FGAIF: Aligning Large Vision-Language Modelswith Fine-grained AI Feedback 논문리뷰
https://arxiv.org/pdf/2404.05046v1Large Vision-Language Models (LVLMs) have demonstrated proficiency in tackling a variety of visual-language tasks. However, current LVLMs suffer from misalignment between text and image modalities which causes three kinds of hallucination problems, i.e., object existence, object attribute, and object relationship. To tackle this issue, existing methods mainly ut..