간단한 요약 : multimodal self-train

카테고리 없음

jinuklee 2025. 1. 24. 04:39

RLAIF-V assigned trustworthiness score to atomic claims of each candidate responses using open-source MLLM.

SIMA utilizes critic prompt considering various factor to obtain the preference pairs dataset.

VL-feedback uses gpt-4v to assess various decoded responses from other LMM regarding Helpfulness, Visual Faithfulness, Ethical Considerations.

FGAIF trains reward model to assign score to various categories of hallucinations.