What Makes a Reward Model a Good Teacher? An Optimization Perspective / The Accuracy Paradox in RLHF: When Better Reward Models Don’t Yield Better Language Models

논문 정보

  • Date: 2025-08-12
  • Reviewer: 준원 장
  • Property: Reinforcement Learning


1. Intro

2. Method

3. How It Works?

4. Experiments

7. Conclusion