'분류 전체보기' 카테고리의 글 목록 (10 Page)

armo RM 논문리뷰 Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts

https://arxiv.org/pdf/2406.12845https://github.com/RLHFlow/RLHF-Reward-Modeling Reinforcement learning from human feedback (RLHF) has emerged as the primary method for aligning large language models (LLMs) with human preferences. The RLHF process typically starts by training a reward model (RM) using human preference data. Conventional RMs are trained on pairwise responses to the same user reque..

카테고리 없음 2024.10.29

MAVIS: Mathematical Visual Instruction Tuning 논문리뷰

https://arxiv.org/pdf/2407.08739 Multi-modal Large Language Models (MLLMs) have recently emerged as a significant focus in academia and industry. Despite their proficiency in general multi-modal scenarios, the mathematical problem-solving capabilities in visual contexts remain insufficiently explored. We identify three key areas within MLLMs that need to be improved: visual encoding of math diag..

multi-step reasoning(수학, 코딩, 계획)/멀티모달 cot 2024.10.25

IMPROVE VISION LANGUAGE MODEL CHAIN-OFTHOUGHT REASONING 논문리뷰

https://arxiv.org/pdf/2410.16198https://github.com/RifleZhang/LLaVA-Reasoner-DPO GitHub - RifleZhang/LLaVA-Reasoner-DPOContribute to RifleZhang/LLaVA-Reasoner-DPO development by creating an account on GitHub.github.comChain-of-thought (CoT) reasoning in vision language models (VLMs) is crucial for improving interpretability and trustworthiness. However, current training recipes lack robust CoT r..

multi-step reasoning(수학, 코딩, 계획)/멀티모달 cot 2024.10.25

Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges

https://arxiv.org/abs/2406.12624

카테고리 없음 2024.10.24

Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking

https://arxiv.org/abs/2409.15268

카테고리 없음 2024.10.24

llm-as-a-judge related work

2.3 RLAIF AND LLM-AS-A-JUDGE Reinforcement Learning from AI Feedback (RLAIF) presents an alternative approach to the standard RLHF pipeline. Bai et al. (2022b) demonstrate the efficacy of RLAIF in training helpful and harmless models without relying on human feedback labels for harmlessness assessment. Their work shows that as language model capabilities improve, AI identification of harms incre..

카테고리 없음 2024.10.23

UnStar: Unlearning with Self-Taught Anti-Sample Reasoning for LLMs 논문리뷰

https://arxiv.org/abs/2410.17050 UnStar: Unlearning with Self-Taught Anti-Sample Reasoning for LLMsThe key components of machine learning are data samples for training, model for learning patterns, and loss function for optimizing accuracy. Analogously, unlearning can potentially be achieved through anti-data samples (or anti-samples), unlearning methodarxiv.orgThe key components of machine lear..

카테고리 없음 2024.10.23

Aligning Large Language Models via Self-Steering Optimization 논문리뷰

https://arxiv.org/pdf/2410.17131

카테고리 없음 2024.10.23

generative reward model 논문리뷰

https://arxiv.org/pdf/2410.12832Reinforcement Learning from Human Feedback (RLHF) has greatly improved the performance of modern Large Language Models (LLMs). The RLHF process is resource-intensive and technically challenging, generally requiring a large collection of human preference labels over model-generated outputs. Reinforcement Learning from AI Feedback (RLAIF) addresses this data collect..

카테고리 없음 2024.10.23

MJ-BENCH: Is Your Multimodal Reward ModelReally a Good Judge for Text-to-Image Generation 논문리뷰

https://arxiv.org/pdf/2407.04842While text-to-image models like DALLE-3 and Stable Diffusion are rapidly proliferating, they often encounter challenges such as hallucination, bias, and the production of unsafe, low-quality output. To effectively address these issues, it is crucial to align these models with desired behaviors based on feedback from a multimodal judge. Despite their significance, ..

카테고리 없음 2024.10.23

이진욱님의 블로그

분류 전체보기 290

티스토리툴바

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31