Aug 19, 2025 ON THE GENERALIZATION OF SFT: A REINFORCEMENT LEARNING PERSPECTIVE WITH REWARD RECTIFICATION Aug 12, 2025 What Makes a Reward Model a Good Teacher? An Optimization Perspective / The Accuracy Paradox in RLHF: When Better Reward Models Don’t Yield Better Language Models Aug 12, 2025 The Accuracy Paradox in RLHF: When Better Reward Models Don’t Yield Better Language Models / What Makes a Reward Model a Good Teacher? An Optimization Perspective Apr 15, 2025 Universal and Transferable Adversarial Attacks on Aligned Language Models Apr 08, 2025 Reasoning Models Don’t Always Say What They Think Mar 25, 2025 ReFT: Reasoning with Reinforced Fine-Tuning Feb 18, 2025 DeepSeek v3 Oct 17, 2024 Rule Based Rewards for Language Model Safety Jul 23, 2024 Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning Jul 23, 2024 Step-DPO : Step-wise preference optimization for long-chain reasoning of LLMs Jun 11, 2024 Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? May 28, 2024 SimPO: Simple Preference Optimization with a Reference-Free Reward May 27, 2024 Understanding the performance gap between online and offline alignment algorithms Apr 23, 2024 ORPO: Monolithic Preference Optimization without Reference Model Apr 02, 2024 Preference-free Alignment Learning with Regularized Relevance Reward Mar 05, 2024 Beyond Memorization: Violating Privacy Via Inferencing With LLMs Feb 27, 2024 SELF-RAG: LEARNING TO RETRIEVE, GENERATE, AND CRITIQUE THROUGH SELF-REFLECTION Feb 06, 2024 Self-Rewarding Language Models Sep 19, 2023 The CRINGE Loss: Learning what language not to model Jun 29, 2023 QLoRA: Eficient Finetuning of Quantized LLMs Jun 22, 2023 The False Promise of Imitating Proprietary LLMs Jan 26, 2023 Task-aware Retrieval with Instructions