분류 전체보기 251

FGAIF: Aligning Large Vision-Language Modelswith Fine-grained AI Feedback 논문리뷰

https://arxiv.org/pdf/2404.05046v1Large Vision-Language Models (LVLMs) have demonstrated proficiency in tackling a variety of visual-language tasks. However, current LVLMs suffer from misalignment between text and image modalities which causes three kinds of hallucination problems, i.e., object existence, object attribute, and object relationship. To tackle this issue, existing methods mainly ut..

GLOV: GUIDED LARGE LANGUAGE MODELS AS IMPLICIT OPTIMIZERS FOR VISION LANGUAGE MODELS 논문리뷰

https://arxiv.org/pdf/2410.06154In this work, we propose a novel method (GLOV) enabling Large Language Models (LLMs) to act as implicit Optimizers for Vision-Language Models (VLMs) to enhance downstream vision tasks. Our GLOV meta-prompts an LLM with the downstream task description, querying it for suitable VLM prompts (e.g., for zeroshot classification with CLIP). These prompts are ranked accor..

LMM의 DPO : Aligning Modalities in Vision Large Language Models via Preference Fine-tuning 논문리뷰

https://arxiv.org/abs/2402.11411Instruction-following Vision Large Language Models (VLLMs) have achieved significant progress recently on a variety of tasks. These approaches merge strong pre-trained vision models and large language models (LLMs). Since these components are trained separately, the learned representations need to be aligned with joint training on additional image-language pairs. ..