'분류 전체보기' 카테고리의 글 목록 (12 Page)

Rlaif-v: Aligning mllms through open-source ai feedback forsuper gpt-4v trustworthiness.

https://arxiv.org/abs/2405.17220Learning from feedback reduces the hallucination of multimodal large language models (MLLMs) by aligning them with human preferences. While traditional methods rely on labor-intensive and time-consuming manual labeling, recent approaches employing models as automatic labelers have shown promising results without human intervention. However, these methods heavily r..

inference-time, RLHF/STaR, ResT - LMM 2024.10.13

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

https://arxiv.org/abs/2312.00849Multimodal Large Language Models (MLLMs) have recently demonstrated impressive capabilities in multimodal understanding, reasoning, and interaction. However, existing MLLMs prevalently suffer from serious hallucination problems, generating text that is not factually grounded in associated images. The problem makes existing MLLMs untrustworthy and thus impractical ..

inference-time, RLHF/STaR, ResT - LMM 2024.10.13

Enhancing visual-language modality alignment in large vision language models via self-improvement 논문리뷰

https://arxiv.org/abs/2405.15973 Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-ImprovementLarge vision-language models (LVLMs) have achieved impressive results in various visual question-answering and reasoning tasks through vision instruction tuning on specific datasets. However, there is still significant room for improvement in the alignmentarxiv.orgLar..

inference-time, RLHF/STaR, ResT - LMM 2024.10.13

Mantis: Interleaved multi-image instruction tuning 논문리뷰

https://arxiv.org/abs/2405.01483Large multimodal models (LMMs) have shown great results in single-image vision language tasks. However, their abilities to solve multi-image visual language tasks is yet to be improved. The existing LMMs like OpenFlamingo, Emu2, Idefics gain their multi-image ability through pre-training on hundreds of millions of noisy interleaved image-text data from the web, wh..

카테고리 없음 2024.10.13

FLASK: FINE-GRAINED LANGUAGE MODELEVALUATION BASED ON ALIGNMENT SKILL SETS 논문리뷰

https://openreview.net/pdf?id=CYmF38ysDaEvaluation of Large Language Models (LLMs) is challenging because instruction-following necessitates alignment with human values and the required set of skills varies depending on the instruction. However, previous studies have mainly focused on coarse-grained evaluation (i.e. overall preference-based evaluation), which limits interpretability since it doe..

카테고리 없음 2024.10.13

Large Language Models as Optimizers 논문리뷰

https://arxiv.org/abs/2309.03409

카테고리 없음 2024.10.12

MM-Vet v2: A Challenging Benchmark to EvaluateLarge Multimodal Models for Integrated Capabilities 논문리뷰

https://arxiv.org/pdf/2408.00765

카테고리 없음 2024.10.12

RATIONALYST: Pre-training Process-Supervisionfor Improving Reasoning 논문리뷰

https://arxiv.org/pdf/2410.01044

카테고리 없음 2024.10.12

TLDR: Token-Level Detective Reward Model forLarge Vision Language Models 논문리뷰

https://arxiv.org/pdf/2410.04734