간단한 요약 : multimodal self-train
RLAIF-V assigned trustworthiness score to atomic claims of each candidate responses using open-source MLLM. SIMA utilizes critic prompt considering various factor to obtain the preference pairs dataset. VL-feedback uses gpt-4v to assess various decoded responses from other LMM regarding Helpfulness, Visual Faithfulness, Ethical Considerations. FGAIF trains reward model to assign score to va..