분류 전체보기 250

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

https://arxiv.org/pdf/2411.10442Here's the text with each sentence on a new line: Existing open-source multimodal large language models (MLLMs) generally follow a training process involving pretraining and supervised fine-tuning. However, these models suffer from distribution shifts, which limit their multimodal reasoning, particularly in the Chain-of-Thought (CoT) performance. To address this, ..

카테고리 없음 2024.11.24

Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search

https://arxiv.org/abs/2411.11694 Technical Report: Enhancing LLM Reasoning with Reward-guided Tree SearchRecently, test-time scaling has garnered significant attention from the research community, largely due to the substantial advancements of the o1 model released by OpenAI. By allocating more computational resources during the inference phase, large languagarxiv.orgRecently, test-time scaling ..

카테고리 없음 2024.11.21

Is Your LLM Secretly a World Model of the Internet?MODEL-BASED PLANNING FOR WEB AGENTS

https://arxiv.org/pdf/2411.06559Language agents have demonstrated promising capabilities in automating webbased tasks, though their current reactive approaches still underperform largely compared to humans. While incorporating advanced planning algorithms, particularly tree search methods, could enhance these agents' performance, implementing tree search directly on live websites poses significa..

카테고리 없음 2024.11.21

Search, Verify and Feedback: Towards Next GenerationPost-training Paradigm of Foundation Models via Verifier Engineering

https://arxiv.org/pdf/2411.11504The evolution of machine learning has increasingly prioritized the development of powerful models and more scalable supervision signals. However, the emergence of foundation models presents significant challenges in providing effective supervision signals necessary for further enhancing their capabilities. Consequently, there is an urgent need to explore novel sup..

카테고리 없음 2024.11.20

Self-Evolved Reward Learning for LLMs

https://arxiv.org/pdf/2411.00418 Reinforcement Learning from Human Feedback (RLHF) is a crucial technique for aligning language models with human preferences, playing a pivotal role in the success of conversational models like GPT-4, ChatGPT, and Llama 2. A core challenge in employing RLHF lies in training a reliable reward model (RM), which relies on high-quality labels typically provided by hu..

카테고리 없음 2024.11.16

Let's Be Self-generated via Step by Step: A Curriculum Learning Approach to Automated Reasoning with Large Language Models

https://arxiv.org/abs/2410.21728Let me help you add line breaks between sentences from the abstract: Teaching large language models (LLMs) to generate text with citations to evidence sources can mitigate hallucinations and enhance verifiability in information-seeking systems. However, improving this capability requires highquality attribution data, which is costly and labor-intensive. Inspired b..

카테고리 없음 2024.11.16