Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling 논문리뷰
https://arxiv.org/pdf/2408.16737Training on high-quality synthetic data from strong language models (LMs) is a common strategy to improve the reasoning performance of LMs. In this work, we revisit whether this strategy is computeoptimal under a fixed inference budget (e.g., FLOPs). To do so, we investigate the trade-offs between generating synthetic data using a stronger but more expensive (SE) ..