카테고리 없음

Progressive Multimodal Reasoning via Active Retrieval

jinuklee 2024. 12. 29. 03:42

https://arxiv.org/html/2412.14835v1

 

Progressive Multimodal Reasoning via Active Retrieval

Large Language Models (LLMs) [72, 67, 25, 112, 97] and Multimodal Large Language Models (MLLMs) [54, 6, 104, 125, 13, 14, 42] have rapidly advanced, with broad applications in mathematics [126, 106], programming [31, 91], medicine [45], character reco

arxiv.org

요약

 

text Retrieval

they employ contriever

https://arxiv.org/pdf/2112.09118

 

cross modal Retreval

by utilizing CLIP model, 

->

encode text image pairs (query)

https://github.com/DAMO-NLP-SG/multimodal_textbook 의 식 사용

->

perform cross-modal retrieval between the encoding of each multimodal query and the entire retrieval database, utilizing FAISS [36] for indexing to retrieve K samples for each query:

 

# knowledge concept filtering

r 은 retrieved insights from the corpus 

compute cosine similarity both multimodal query Q^m and its knowledge concept label L_{kc}

T denotes as threshold