분류 전체보기 286

wildvision-arena WILDVISION: Evaluating Vision-Language Modelsin the Wild with Human Preference 논문리뷰

https://arxiv.org/pdf/2406.11069Recent breakthroughs in vision-language models (VLMs) emphasize the necessity of benchmarking human preferences in real-world multimodal interactions. To address this gap, we launched WILDVISION-ARENA (WV-ARENA), an online platform that collects human preferences to evaluate VLMs. We curated WVBENCH by selecting 500 high-quality samples from 8,000 user submissions..

카테고리 없음 2024.10.15

MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities 논문리뷰

https://arxiv.org/pdf/2408.00765https://huggingface.co/spaces/whyu/MM-Vet-v2_Evaluator MM-Vet v2 Evaluator - a Hugging Face Space by whyu huggingface.co MM-Vet, with open-ended vision-language questions targeting at evaluating integrated capabilities, has become one of the most popular benchmarks for large multimodal model evaluation. MM-Vet assesses six core vision-language (VL) capabilities: r..

카테고리 없음 2024.10.15

MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchm

https://arxiv.org/abs/2402.04788 MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language BenchmarkMultimodal Large Language Models (MLLMs) have gained significant attention recently, showing remarkable potential in artificial general intelligence. However, assessing the utility of MLLMs presents considerable challenges, primarily due to the absence ofarxiv.orgMultimodal Large L..

GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks 논문리뷰

https://arxiv.org/abs/2311.01361Automatically evaluating vision-language tasks is challenging, especially when it comes to reflecting human judgments due to limitations in accounting for finegrained details. Although GPT-4V has shown promising results in various multimodal tasks, leveraging GPT-4V as a generalist evaluator for these tasks has not yet been systematically explored. We comprehensiv..

VL-feedback: Silkie: Preference Distillation for Large Visual Language Models

https://arxiv.org/abs/2312.10665 Silkie: Preference Distillation for Large Visual Language ModelsThis paper explores preference distillation for large vision language models (LVLMs), improving their ability to generate helpful and faithful responses anchoring the visual context. We first build a vision-language feedback (VLFeedback) dataset utilizingarxiv.orgThis paper explores preference distil..