2024/09 32

MMHAL-BENCH : ALIGNING LARGE MULTIMODAL MODELSWITH FACTUALLY AUGMENTED RLHF 논문리뷰

https://arxiv.org/pdf/2309.14525 Large Multimodal Models (LMM) are built across modalities and the misalign-ment between two modalities can result in “hallucination”, generating textual out-puts that are not grounded by the multimodal information in context. To address the multimodal misalignment issue, we adapt the Reinforcement Learning from Human Feedback (RLHF) from the text domain to the ta..

데이터셋 2024.09.30

When can llms actually correct their own mistakes?a critical survey of self-correction of llms

https://arxiv.org/pdf/2406.01297Self-correction is an approach to improving responses from large language models (LLMs) by refining the responses using LLMs during inference. Prior work has proposed various self-correction frameworks using different sources of feedback, including self-evaluation and external feedback. However, there is still no consensus on the question of when LLMs can correct ..

카테고리 없음 2024.09.21