https://arxiv.org/pdf/2304.08485Instruction tuning large language models (LLMs) using machine-generated instruction-following data has been shown to improve zero-shot capabilities on new tasks, but the idea is less explored in the multimodal field we present the first attempt to use language-only GPT-4 to generate multimodal language-image instruction-following data. By instruction tuning on suc..