| Aug 19, 2025 | ON THE GENERALIZATION OF SFT: A REINFORCEMENT LEARNING PERSPECTIVE WITH REWARD RECTIFICATION |
| Aug 05, 2025 | Impact of Fine-Tuning Methods on Memorization in Large Language Models |
| Jul 15, 2025 | Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning |
| Jun 17, 2025 | Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models |
| Jun 10, 2025 | DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models |
| Apr 22, 2025 | Fine-tuning Vision-Language-Action Models: Optimizing Speed and Success |
| Mar 25, 2025 | ReFT: Reasoning with Reinforced Fine-Tuning |
| Mar 11, 2025 | WHEN IS TASK VECTOR Provably EFFECTIVE FOR MODEL EDITING? A GENERALIZATION ANALYSIS OF NONLINEAR TRANSFORMERS |
| Mar 04, 2025 | Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning |
| Jan 14, 2025 | OpenVLA: An Open-Source Vision-Language-Action Model |
| Sep 23, 2024 | Training Language Models to Self-Correct via Reinforcement Learning |
| Sep 23, 2024 | SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories |
| Sep 02, 2024 | Many-shot jailbreaking |
| Aug 20, 2024 | Knowledge-Augmented Reasoning distillation for Small Language Models in Knowledge-Intensive Tasks (KARD) |
| Aug 13, 2024 | Knowledge conflict survey |
| Jul 02, 2024 | RL-JACK: Reinforcement Learning-powered Black-box Jailbreaking Attack against LLMs |
| Jun 11, 2024 | Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? |
| May 27, 2024 | Understanding the performance gap between online and offline alignment algorithms |
| May 21, 2024 | LLAMA PRO: Progressive LLaMA with Block Expansion |
| May 07, 2024 | How to Train LLM? - From Data Parallel To Fully Sharded Data Parallel |
| Apr 30, 2024 | Training diffusion modelse with reinforcement learning |
| Apr 30, 2024 | Many-Shot In-Context Learning |
| Apr 23, 2024 | ORPO: Monolithic Preference Optimization without Reference Model |
| Mar 19, 2024 | Unveiling the Generalization Power of Fine-Tuned Large Language Models |
| Mar 12, 2024 | A Simple and Effective Pruning Approach for Large Language Models |
| Feb 06, 2024 | Self-Rewarding Language Models |
| Jan 30, 2024 | Lion: Adversarial Distillation of Proprietary Large Language Models |
| Jan 09, 2024 | Making Large Language Models A Better Foundation For Dense Retrieval |
| Oct 31, 2023 | A Survey on Large Language Model based Autonomous Agents |
| Oct 10, 2023 | LongLoRA: Efficient Fine-Tuning of Long-Context Large Language Models |
| Sep 12, 2023 | A Systematic Study of Knowledge Distillation for Natural Language Generation with Pseudo-Target Training |
| Aug 29, 2023 | Code Llama: Open Foundation Models for Code |
| Jun 15, 2023 | Do Prompt-Based Models Really Understand the Meaning of Their Prompts? |
| Apr 20, 2023 | FALSESUM : Generating Document-level NLI Examples for Recognizing Factual Inconsistency in Summarization |
| Apr 13, 2023 | P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks |
| Mar 30, 2023 | GPT Understands, Too |
| Feb 02, 2023 | Measuring and Improving Semantic Diversity of Dialogue Generation |
| Jan 19, 2023 | KALA: Knowledge-Augmented Language Model Adaptation |
| Jan 12, 2023 | A Survey for In-context Learning |