Progressive Multimodal Reasoning via Active Retrieval

카테고리 없음

Progressive Multimodal Reasoning via Active Retrieval

jinuklee 2024. 12. 29. 03:42

Large Language Models (LLMs) [72, 67, 25, 112, 97] and Multimodal Large Language Models (MLLMs) [54, 6, 104, 125, 13, 14, 42] have rapidly advanced, with broad applications in mathematics [126, 106], programming [31, 91], medicine [45], character reco

arxiv.org

요약

text Retrieval

they employ contriever

https://arxiv.org/pdf/2112.09118

cross modal Retreval

by utilizing CLIP model,

encode text image pairs (query)

https://github.com/DAMO-NLP-SG/multimodal_textbook 의 식 사용

perform cross-modal retrieval between the encoding of each multimodal query and the entire retrieval database, utilizing FAISS [36] for indexing to retrieve K samples for each query:

# knowledge concept filtering

r 은 retrieved insights from the corpus

compute cosine similarity both multimodal query Q^m and its knowledge concept label L_{kc}

T denotes as threshold

현재글Progressive Multimodal Reasoning via Active Retrieval

이진욱님의 블로그

ai research memo for reference

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

이진욱님의 블로그

Progressive Multimodal Reasoning via Active Retrieval

'카테고리 없음'의 다른글

티스토리툴바