분류 전체보기 251

LLM + VLM + 디퓨전 모델

https://arxiv.org/html/2407.20798v1DIFFUSION AUGMENTED AGENTS: A FRAMEWORK FOR EFFICIENT EXPLORATION AND TRANSFER LEARNINGRL의 실제 환경에서의 data scarcity 문제기존의 학습된 knowledge에서 transfer, sample 효율성을 향상시키기LLM이 main controller, 즉 뇌처럼 역할LLM이 VLM, 디퓨전모델(DM)에 input을 즉 querying , 에이젼트의 high-level 행동을 이끔(guide) LLM을 통해 instruction으로 text description을 얻고 (decompose)VLM을 통해 observation과 텍스트 묘사를 임베딩으로 얻고 이를 코사인..

agent 2024.08.11

LLM Critics Help Catch LLM Bugs 논문리뷰

https://arxiv.org/pdf/2407.00215scalable oversight2024년 6/28, 7/12모델의 output을 평가하기 위한 다른 LLM (주로 RLFH를 위함)-> human supervision X, human evaluation 향상오픈AI - 실제 세팅에서 scalable oversight 실행 (toy 세팅이 아닌)딥마인드 - [debate, consultancy] open or not 의 6개의 프로토콜 환경에서 scalable oversight 테스트오픈 AI 코드 생성 환경에서의 에러를 detect, 실제 flawless라고 평가된 훈련데이터에서 수백개의 결점 발견, 또한 out of distribution의 코드 생성이 아닌 데이터셋에서도 발견(question..

RLFH 2024.08.10

chatdev 논문리뷰 (Communicative Agents for Software Development)

https://arxiv.org/pdf/2307.07924v5https://github.com/OpenBMB/ChatDevchat-powered software- development framework를 의미 Technically, to facilitate cooperative communication,협력적 커뮤니케이션을 촉진시키기 위해 ChatDev introduces chat chain to further break down each phase into smaller and manageable subtasks,채팅 chain을 사용해 각 단계를 subtask로 나누는 which guides multi-turn communications between different roles to propose ..

agent/multi - agent 2024.08.10

AGENTGYM: Evolving Large Language Model-basedAgents across Diverse Environments 논문리뷰

https://arxiv.org/pdf/2406.04151가장 중요한것1) diverse environments for agent exploration and learning 에이전트의 탐색과 학습을 위한 다양한 환경 2) a trajectory set to equip agents with basic capabilities and prior knowledge에이전트에게 기본적인 능력과 사전 지식을 갖추게 하는 trajectory 집합3) an effective and scalable evolution method효과적이고 확장 가능한 진화 방법

agent/multi - agent 2024.08.08

Scaling LLM Test-Time Compute Optimally canbe More Effective than Scaling Model Parameters 논문리뷰

https://arxiv.org/abs/2408.03314(1) searching against dense, process-based verifier reward models; and(2) updating the model’s distribution over a response adaptively, given the prompt at test time. We find that in both cases, the effectiveness of different approaches to scaling test-time compute critically varies depending on the difficulty of the prompt모델 크기를 키우는 것과 test-time에 추가 계산을 수행하는 것 중 ..

카테고리 없음 2024.08.08

ToRA ( A TOOL-INTEGRATED REASONING AGENTFOR MATHEMATICAL PROBLEM SOLVING) 논문리뷰

https://openreview.net/pdf?id=Ep0TtjVoapa 는 CoT, b는 PAL ,c는 ToRA의 tool(PAL)을 통합한 rationale(CoT)을 활용imitation learningGPT4 같은 모델을 써서 만든 ToRA corpus로 모델 M 학습진행 output space shaping 모델 M의 ToRA를 샘플링 후 이를 teacher model에 evaluate, validate 후 수정된 trajectory 를 corpus로 사용

agentscope 논문리뷰 (A Flexible yet Robust Multi-Agent Platform)

가장 중요한점https://arxiv.org/pdf/2402.14034https://github.com/modelscope/agentscope GitHub - modelscope/agentscope: Start building LLM-empowered multi-agent applications in an easier way.Start building LLM-empowered multi-agent applications in an easier way. - modelscope/agentscopegithub.com following aspects feature the challenge1) Agents involved in a multi-agent application can specialize at diff..

agent/multi - agent 2024.08.07

GPTSwarm 논문리뷰 (Language Agents as Optimizable Graph)

https://arxiv.org/pdf/2402.16823노드는 오퍼레이션(LLM , tool 사용)edge는 에이젼트간의 커뮤니케이션+ The nodes implement functions to process multimodal data or query LLMs, and the edges describe the information flow between operations 2. GPTSwarm2.1. Language Agents as Graphs2.2. Graph Definitioninput x and context information z from its predecessor nodes by applying a computational routine f 2.3. Edge Optimization  2..

agent/multi - agent 2024.08.06