분류 전체보기 251

Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling 논문리뷰

https://arxiv.org/pdf/2408.16737Training on high-quality synthetic data from strong language models (LMs) is a common strategy to improve the reasoning performance of LMs. In this work, we revisit whether this strategy is computeoptimal under a fixed inference budget (e.g., FLOPs). To do so, we investigate the trade-offs between generating synthetic data using a stronger but more expensive (SE) ..

카테고리 없음 2024.10.03

Archon:An Architecture Search Framework for Inference-Time Techniques 논문리뷰

https://arxiv.org/pdf/2409.15254Archon:An Architecture Search Framework for Inference-Time Techniques 논문리뷰2024년 10월 1일자 논문 challenge1) inference-time compute budge에 관한 최적의 컴퓨팅량2) 다양한 inference-time 기술 사이의 interaction에 대한 이해3) model choice의 큰 공간(범위)을 효율적으로 search해 best 답변을 내놓기에 대한 challenge가 존재 아직 taskinstruction-following tasks (MT Bench, AlpacaEval 2.0, Arena-Hard-Auto)reasoning tasks (MixEval,..

카테고리 없음 2024.10.02

MMHAL-BENCH : ALIGNING LARGE MULTIMODAL MODELSWITH FACTUALLY AUGMENTED RLHF 논문리뷰

https://arxiv.org/pdf/2309.14525 Large Multimodal Models (LMM) are built across modalities and the misalign-ment between two modalities can result in “hallucination”, generating textual out-puts that are not grounded by the multimodal information in context. To address the multimodal misalignment issue, we adapt the Reinforcement Learning from Human Feedback (RLHF) from the text domain to the ta..

데이터셋 2024.09.30