분류 전체보기 251

When can llms actually correct their own mistakes?a critical survey of self-correction of llms

https://arxiv.org/pdf/2406.01297Self-correction is an approach to improving responses from large language models (LLMs) by refining the responses using LLMs during inference. Prior work has proposed various self-correction frameworks using different sources of feedback, including self-evaluation and external feedback. However, there is still no consensus on the question of when LLMs can correct ..

카테고리 없음 2024.09.21

Training Language Models to Self-Correction viaReinforcement Learning (SCoRe), 논문리뷰

https://arxiv.org/pdf/2409.12917point : 모델의 distribution에서 가능한 가장 좋은 final answer을 생성해내기 위함 + 모델 collapse를 막기완전히 스스로 생성한 데이터를 통해 self-correct 능력을 향상방식스스로 생성한 데이터 - distribution mistmatch 회피두단계로 훈련 stage - minimal edit strategy 의 실패 경우의 모델 collapse 회피를 위한것 (STaR)  LLM의 self correction 능력은 비효율적이다 (e.g llm cannot self correct yet 논문)Existing approach는 self-correct을 위해 여러개의 모델, more capable LLM, 혹은 ..

카테고리 없음 2024.09.21