agent

ReAcT 논문리뷰 SYNERGIZING REASONING AND ACTING INLANGUAGE MODELS

jinuklee 2024. 8. 3. 17:40

https://arxiv.org/pdf/2210.0362

The idea of ReAct is simple: we augment the agent’s action space to Aˆ = A ∪ L, where L is the space of language. An action aˆt ∈ L in the language space, which we will refer to as a thought or a reasoning trace, does not affect the external environment, thus leading to no observation feedback. Instead, a thought aˆt aims to compose useful information by reasoning over the current context ct, and update the context ct+1 = (ct, aˆt) to support future reasoning or acting. As shown in Figure 1, there could be various types of useful thoughts, e.g. decomposing task goals and create action plans (2b, Act 1; 1d, Thought 1), injecting commonsense knowledge relevant to task solving (2b, Act 1), extracting important parts from observations (1d, Thought2, 4), track progress and transit action plans (2b, Act 8), handle exceptions and adjust action plans (1d, Thought 3), and so on

 

 

Alfworld 벤치마크 문제를 해결하기 위한 ReAct 예시인데

일단 어떤 task를 solve 하기 위해 우리는 환경으로부터 observation을 받아 action(following the specific policy)을 생성하는데 이떄 context가 다음과 같다. 

Learning a policy is challenging when the mapping c → a is highly implicit and requires extensive computation

 

ReAcT는 이 a ( agent’s action space = 이 경우 space of language)를 augment하는 것

 

이 augmented action space 는 사고, reasoning trace라고 칭함

obeservation이나 외부 환경에 이 action이 영향을 주지 않지만 현재 context c를 기반으로 A는 유용한 정보를 생성하고 이를 통해 context c(t+1)로 업데이트