multi-step reasoning(수학, 코딩, 계획)/멀티모달 cot 5

MAVIS: Mathematical Visual Instruction Tuning 논문리뷰

https://arxiv.org/pdf/2407.08739 Multi-modal Large Language Models (MLLMs) have recently emerged as a significant focus in academia and industry. Despite their proficiency in general multi-modal scenarios, the mathematical problem-solving capabilities in visual contexts remain insufficiently explored. We identify three key areas within MLLMs that need to be improved: visual encoding of math diag..

IMPROVE VISION LANGUAGE MODEL CHAIN-OFTHOUGHT REASONING 논문리뷰

https://arxiv.org/pdf/2410.16198https://github.com/RifleZhang/LLaVA-Reasoner-DPO GitHub - RifleZhang/LLaVA-Reasoner-DPOContribute to RifleZhang/LLaVA-Reasoner-DPO development by creating an account on GitHub.github.comChain-of-thought (CoT) reasoning in vision language models (VLMs) is crucial for improving interpretability and trustworthiness. However, current training recipes lack robust CoT r..