분류 전체보기 252

armo RM 논문리뷰 Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts

https://arxiv.org/pdf/2406.12845https://github.com/RLHFlow/RLHF-Reward-Modeling Reinforcement learning from human feedback (RLHF) has emerged as the primary method for aligning large language models (LLMs) with human preferences. The RLHF process typically starts by training a reward model (RM) using human preference data. Conventional RMs are trained on pairwise responses to the same user reque..

카테고리 없음 2024.10.29

IMPROVE VISION LANGUAGE MODEL CHAIN-OFTHOUGHT REASONING 논문리뷰

https://arxiv.org/pdf/2410.16198https://github.com/RifleZhang/LLaVA-Reasoner-DPO GitHub - RifleZhang/LLaVA-Reasoner-DPOContribute to RifleZhang/LLaVA-Reasoner-DPO development by creating an account on GitHub.github.comChain-of-thought (CoT) reasoning in vision language models (VLMs) is crucial for improving interpretability and trustworthiness. However, current training recipes lack robust CoT r..

카테고리 없음 2024.10.25

UnStar: Unlearning with Self-Taught Anti-Sample Reasoning for LLMs 논문리뷰

https://arxiv.org/abs/2410.17050 UnStar: Unlearning with Self-Taught Anti-Sample Reasoning for LLMsThe key components of machine learning are data samples for training, model for learning patterns, and loss function for optimizing accuracy. Analogously, unlearning can potentially be achieved through anti-data samples (or anti-samples), unlearning methodarxiv.orgThe key components of machine lear..

카테고리 없음 2024.10.23