이진욱님의 블로그

  • 홈
  • 태그
  • 방명록
  • 빅테크 리포트
  • LLM
  • 멀티모달
  • 디퓨전 모델

2025/02/12 2

InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model 요약

3IXC2.5-RewardData PreparationReward models are trained using pairwise preference annotations (e.g., prompts x with chosen responses yc and rejected responses yr) that reflect human preferences. While existing public preference data is primarily textual, with limited image and scarce video examples, we train IXC-2.5-Reward using both open-source data and a newly collected dataset to ensure broad..

카테고리 없음 2025.02.12

MASTER: A Multi-Agent System with LLM Specialized MCTS

3.1 PreliminariesBefore introducing our framework, we first present MCTS to clarify the motivation behind our work.  MCTS Coulom (2006) is a widely used planning algorithm and famously employed in AlphaGo Silver et al. (2016). Taking the Game of Go as an example, the algorithm assists in selecting the best possible action in the current state of the board based on their average rewards.  These r..

카테고리 없음 2025.02.12
이전
1
다음
더보기
프로필사진

이진욱님의 블로그

ai research memo for reference

  • 분류 전체보기 (286)
    • inference-time, RLHF (41)
      • STaR, ReST (4)
      • STaR, ResT - LMM (17)
      • search (language) (10)
      • search (multimodal) (2)
      • Process reward model (6)
      • scalable oversight (1)
      • red-team (1)
    • VLM (5)
    • RLFH (2)
    • 프롬프팅 (3)
    • interpretability (2)
    • agent (23)
      • on-device agent (1)
      • multi - agent (17)
      • 멀티 에이젼트 결과 (2)
    • PEFT (1)
      • LoRA (1)
    • multi-step reasoning(수학, 코딩.. (7)
      • 멀티모달 cot (5)
    • 한계 limitation (1)
    • 데이터셋 (3)
      • 합성데이터 (1)
    • 3D, real world, game, VR (2)

Tag

최근글과 인기글

  • 최근글
  • 인기글

최근댓글

공지사항

페이스북 트위터 플러그인

  • Facebook
  • Twitter

Archives

Calendar

«   2025/02   »
일 월 화 수 목 금 토
1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28

방문자수Total

  • Today :
  • Yesterday :

Copyright © Kakao Corp. All rights reserved.

티스토리툴바