Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
https://arxiv.org/pdf/2411.10442Here's the text with each sentence on a new line: Existing open-source multimodal large language models (MLLMs) generally follow a training process involving pretraining and supervised fine-tuning. However, these models suffer from distribution shifts, which limit their multimodal reasoning, particularly in the Chain-of-Thought (CoT) performance. To address this, ..