'분류 전체보기' 카테고리의 글 목록 (14 Page)

ReST 논문리뷰 Reinforced Self-Training (ReST) for Language Modeling

https://arxiv.org/pdf/2308.08998요약 : Reward model + SFT 핵심 Grow stage에서 데이터셋을 샘플링 ~ current policy modelreward model을 통해 filtered improve 파트 NLL loss Figure 1 모식도 pseudo - 알고리즘 Reward 부여 방식 : 토큰 뒤에 scalar reward로

inference-time, RLHF/STaR, ReST 2024.10.09

LMM-as-a-judge / PROMETHEUS-VISION:Vision-Language Model as a Judge for Fine-Grained Evaluation 논문리뷰

https://arxiv.org/pdf/2401.06591Assessing long-form responses generated by Vision-Language Models (VLMs) is challenging. It not only requires checking whether the VLM follows the given instruction but also verifying whether the text output is properly grounded on the given image. Inspired by the recent approach of evaluating LMs with LMs, in this work, we propose to evaluate VLMs with VLMs. For ..

inference-time, RLHF/STaR, ResT - LMM 2024.10.07

Calibrated Self-Rewarding Vision Language Models 논문리뷰

https://arxiv.org/pdf/2405.14622요약reward 부여 방식: self-generated instruction-following score( calculated using the language decoder of the LVLM , 이거 하나로만 안되는 이유 : modality misalignment, potentially overlooking visual input information ), + the image-response relevance score, R^I (s).( We leverage CLIP-score [17] for this calculation ) 3 Calibrated Self-Rewarding Vision Language Models To address t..

inference-time, RLHF/STaR, ResT - LMM 2024.10.07

Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward

https://arxiv.org/pdf/2404.01258v2 Preference modeling techniques, such as direct preference optimization (DPO), has shown effective in enhancing the generalization abilities of large language model (LLM) However, in tasks involving video instruction following, providing informative feedback, especially for detecting hallucinations in generated responses, remains a significant challenge Previous..

inference-time, RLHF/STaR, ResT - LMM 2024.10.05

Retrieval-Augmented Egocentric Video Captioning 논문리뷰

https://arxiv.org/pdf/2401.00789

카테고리 없음 2024.10.03

llava-in-the-wild, LLaVA-Bench : Visual Instruction Tuning 논문리뷰

https://arxiv.org/pdf/2304.08485Instruction tuning large language models (LLMs) using machine-generated instruction-following data has been shown to improve zero-shot capabilities on new tasks, but the idea is less explored in the multimodal field we present the first attempt to use language-only GPT-4 to generate multimodal language-image instruction-following data. By instruction tuning on suc..

데이터셋 2024.10.03

Distilling System 2 into System 1 논문리뷰

https://arxiv.org/pdf/2407.06023

카테고리 없음 2024.10.03

Step-level Value Preference Optimization for Mathematical Reasoning

https://arxiv.org/pdf/2406.10858

카테고리 없음 2024.10.03

Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling 논문리뷰

https://arxiv.org/pdf/2408.16737Training on high-quality synthetic data from strong language models (LMs) is a common strategy to improve the reasoning performance of LMs. In this work, we revisit whether this strategy is computeoptimal under a fixed inference budget (e.g., FLOPs). To do so, we investigate the trade-offs between generating synthetic data using a stronger but more expensive (SE) ..

카테고리 없음 2024.10.03

HPT, Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers 논문리뷰

https://arxiv.org/abs/2409.20537

카테고리 없음 2024.10.03

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

이진욱님의 블로그

분류 전체보기 286

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역