- formatting
- images
- links
- math
- code
- blockquotes
- external-services
•
•
•
•
•
•
-
Spurious Rewards: Rethinking Training Signals in RLVR
논문 리뷰 - RLVR 관련 연구
-
The Accuracy Paradox in RLHF: When Better Reward Models Don’t Yield Better Language Models / What Makes a Reward Model a Good Teacher? An Optimization Perspective
논문 리뷰 - Reinforcement Learning, Reward Model 관련 연구
-
ON THE EXPRESSIVENESS OF SOFTMAX ATTENTION: A RECURRENT NEURAL NETWORK PERSPECTIVE
논문 리뷰 - Efficient Transformer 관련 연구