'분류 전체보기' 카테고리의 글 목록 (9 Page)

분류 전체보기 251

MM-Vet v2: A Challenging Benchmark to EvaluateLarge Multimodal Models for Integrated Capabilities 논문리뷰

https://arxiv.org/pdf/2408.00765

카테고리 없음 2024.10.12

RATIONALYST: Pre-training Process-Supervisionfor Improving Reasoning 논문리뷰

https://arxiv.org/pdf/2410.01044

카테고리 없음 2024.10.12

TLDR: Token-Level Detective Reward Model forLarge Vision Language Models 논문리뷰

https://arxiv.org/pdf/2410.04734

inference-time, RLHF/STaR, ResT - LMM 2024.10.12

LLaVA-OneVision: Easy Visual Task Transfer 논문리뷰

https://arxiv.org/pdf/2408.03326

카테고리 없음 2024.10.12

SELF-BOOSTING LARGE LANGUAGE MODELS WITHSYNTHETIC PREFERENCE DATA 논문리뷰

https://arxiv.org/pdf/2410.06961

카테고리 없음 2024.10.12

FGAIF: Aligning Large Vision-Language Modelswith Fine-grained AI Feedback 논문리뷰

https://arxiv.org/pdf/2404.05046v1Large Vision-Language Models (LVLMs) have demonstrated proficiency in tackling a variety of visual-language tasks. However, current LVLMs suffer from misalignment between text and image modalities which causes three kinds of hallucination problems, i.e., object existence, object attribute, and object relationship. To tackle this issue, existing methods mainly ut..

inference-time, RLHF/STaR, ResT - LMM 2024.10.12

GLOV: GUIDED LARGE LANGUAGE MODELS AS IMPLICIT OPTIMIZERS FOR VISION LANGUAGE MODELS 논문리뷰

https://arxiv.org/pdf/2410.06154In this work, we propose a novel method (GLOV) enabling Large Language Models (LLMs) to act as implicit Optimizers for Vision-Language Models (VLMs) to enhance downstream vision tasks. Our GLOV meta-prompts an LLM with the downstream task description, querying it for suitable VLM prompts (e.g., for zeroshot classification with CLIP). These prompts are ranked accor..

inference-time, RLHF/STaR, ResT - LMM 2024.10.12

MLLM AS RETRIEVER: INTERACTIVELY LEARNINGMULTIMODAL RETRIEVAL FOR EMBODIED AGENTS 논문리뷰

https://arxiv.org/pdf/2410.03450

카테고리 없음 2024.10.12

PVIT, PERSONALIZED VISUAL INSTRUCTION TUNING 논문리뷰

https://arxiv.org/pdf/2410.07113

카테고리 없음 2024.10.12

LMM의 DPO : Aligning Modalities in Vision Large Language Models via Preference Fine-tuning 논문리뷰

https://arxiv.org/abs/2402.11411Instruction-following Vision Large Language Models (VLLMs) have achieved significant progress recently on a variety of tasks. These approaches merge strong pre-trained vision models and large language models (LLMs). Since these components are trained separately, the learned representations need to be aligned with joint training on additional image-language pairs. ..

inference-time, RLHF/STaR, ResT - LMM 2024.10.09

1 ··· 6 7 8 9 10 11 12 ··· 26

이진욱님의 블로그

ai research memo for reference

Today :
Yesterday :

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

분류 전체보기 251

티스토리툴바