분류 전체보기 252

MJ-BENCH: Is Your Multimodal Reward ModelReally a Good Judge for Text-to-Image Generation 논문리뷰

https://arxiv.org/pdf/2407.04842While text-to-image models like DALLE-3 and Stable Diffusion are rapidly proliferating, they often encounter challenges such as hallucination, bias, and the production of unsafe, low-quality output. To effectively address these issues, it is crucial to align these models with desired behaviors based on feedback from a multimodal judge. Despite their significance, ..

카테고리 없음 2024.10.23

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

https://arxiv.org/abs/2306.13394 MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language ModelsMultimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks, showing amazing emergent abilities in recent studies, such as writing poems based on an image. However, it is difficult for these case studies to fully reflect tarxiv.orgMultimodal Large Langu..

카테고리 없음 2024.10.20

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI 논문리뷰

https://arxiv.org/abs/2311.16502 MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGIWe introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning. MMMU includes 11.5K meticulously collected multimodal questions from college exaarxiv.orgWe i..

카테고리 없음 2024.10.20

LOOKING INWARD: LANGUAGE MODELS CAN LEARNABOUT THEMSELVES BY INTROSPECTION 논문리뷰

https://arxiv.org/pdf/2410.13787Humans acquire knowledge by observing the external world, but also by introspection. Introspection gives a person privileged access to their current state of mind (e.g., thoughts and feelings) that is not accessible to external observers. Can LLMs introspect? We define introspection as acquiring knowledge that is not contained in or derived from training data but ..

카테고리 없음 2024.10.19

wildvision-arena WILDVISION: Evaluating Vision-Language Modelsin the Wild with Human Preference 논문리뷰

https://arxiv.org/pdf/2406.11069Recent breakthroughs in vision-language models (VLMs) emphasize the necessity of benchmarking human preferences in real-world multimodal interactions. To address this gap, we launched WILDVISION-ARENA (WV-ARENA), an online platform that collects human preferences to evaluate VLMs. We curated WVBENCH by selecting 500 high-quality samples from 8,000 user submissions..

카테고리 없음 2024.10.15