SYNTHETIC CONTINUED PRETRAINING 논문리뷰 지금의 수많은 corpus를 학습한 pretain 모델은 domain-specific한 작은 corpus의 문서에 적용하기 challengingEntiGraph 알고리즘https://arxiv.org/pdf/2409.07431v1 카테고리 없음 2024.09.13
Building Math Agents with Multi-Turn IterativePreference Learning 논문리뷰 https://arxiv.org/pdf/2409.02392 카테고리 없음 2024.09.10
RAP 논문리뷰 reasoning with language model is planning with world model https://arxiv.org/pdf/2305.14992 카테고리 없음 2024.09.09
OVM, Outcome-supervised Value Models for Planningin Mathematical Reasoning 논문리뷰 https://arxiv.org/pdf/2311.09724 카테고리 없음 2024.09.02
TinyGSM: achieving > 80% on GSM8k with small language models 논문리뷰 https://arxiv.org/pdf/2312.09241 카테고리 없음 2024.09.02
Regularized Best-of-N Sampling to Mitigate Reward Hacking forLanguage Model Alignment 논문리뷰 https://openreview.net/pdf?id=ewRlZPAReR 카테고리 없음 2024.08.29
MULTI-STEP PROBLEM SOLVING THROUGH A VERIFIER: ANEMPIRICAL ANALYSIS ON MODEL-INDUCED PROCESSSUPERVISION 논문리뷰 https://arxiv.org/pdf/2402.02658 inference-time, RLHF/Process reward model 2024.08.29
Tree of Thoughts: Deliberate Problem Solvingwith Large Language Models 논문리뷰 https://arxiv.org/pdf/2305.10601 inference-time, RLHF/search (language) 2024.08.29
Improving Reward Models with Synthetic Critiques 논문리뷰 https://arxiv.org/pdf/2405.20850 inference-time, RLHF/Process reward model 2024.08.29
Judging LLM-as-a-Judgewith MT-Bench and Chatbot Arena 논문리뷰 https://arxiv.org/pdf/2306.05685 카테고리 없음 2024.08.28