카테고리 없음

tool augmented reward modeling 논문 리뷰

jinuklee 2024. 9. 14. 19:47

https://arxiv.org/pdf/2310.01045

 

Our approach enhances RMs with the capability to make informed and dynamic decisions concerning which APIs to employ, when to invoke them, what arguments to pass, and how to effectively integrate the obtained results into the broader reasoning process

Thought: At this initial stage, the model evaluates whether it should engage external APIs (referred to as tool reasoning).

• Action: Subsequently, the model generates the necessary API calls along with the corresponding arguments required for the interactions.

• Observation: The results produced by the external APIs are collected and stored.

• Rationale: This stage involves the aggregation and synthesis of previously acquired information, fostering both induction and reasoning processes, specifically tailored for reward modeling.