| Aug 19, 2025 | Spurious Rewards: Rethinking Training Signals in RLVR |
| Jul 15, 2025 | Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning |
| Jul 15, 2025 | Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models |
| Jul 15, 2025 | Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models |
| Jul 01, 2025 | Reasoning Models Can Be Effective Without Thinking |
| Jul 01, 2025 | Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs |
| Jun 24, 2025 | See What You Are Told: Visual Attention Sink in Large Multimodal Models |
| Jun 17, 2025 | Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models |
| Jun 10, 2025 | DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models |
| Jun 03, 2025 | Textgrad: Automatic “Differentiation” via Text |
| Apr 22, 2025 | Fine-tuning Vision-Language-Action Models: Optimizing Speed and Success |
| Apr 08, 2025 | Reasoning Models Don’t Always Say What They Think |
| Apr 08, 2025 | On the Biology of a Large Language Model |
| Mar 25, 2025 | ReFT: Reasoning with Reinforced Fine-Tuning |
| Mar 11, 2025 | Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs |
| Mar 04, 2025 | SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution |
| Mar 04, 2025 | Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning |
| Feb 18, 2025 | DeepSeek v3 |
| Jan 02, 2025 | Inferring from Logits: Exploring Best Practices for Decoding-Free Generative Candidate Selection |
| Jan 02, 2025 | DeepSeek R1 |
| Jan 02, 2025 | d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning |
| Oct 10, 2024 | FAITHEVAL: CAN YOUR LANGUAGE MODEL STAY FAITHFUL TO CONTEXT, EVEN IF “THE MOON IS MADE OF MARSHMALLOWS” |
| Oct 03, 2024 | QCRD: Quality-guided Contrastive Rationale Distillation for Large Lanauge Models |
| Sep 23, 2024 | SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories |
| Sep 09, 2024 | Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling |
| Aug 20, 2024 | Knowledge-Augmented Reasoning distillation for Small Language Models in Knowledge-Intensive Tasks (KARD) |
| Aug 13, 2024 | Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process |
| Jul 23, 2024 | Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning |
| Jul 23, 2024 | Step-DPO : Step-wise preference optimization for long-chain reasoning of LLMs |
| May 21, 2024 | LLAMA PRO: Progressive LLaMA with Block Expansion |
| Apr 30, 2024 | Many-Shot In-Context Learning |
| Apr 23, 2024 | ORPO: Monolithic Preference Optimization without Reference Model |
| Apr 23, 2024 | Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers? |
| Mar 26, 2024 | Search-in-the-Chain: Interactively Enhancing Large Language Models with Search for Knowledge-intensive Tasks |
| Mar 05, 2024 | Beyond Memorization: Violating Privacy Via Inferencing With LLMs |
| Feb 27, 2024 | SELF-RAG: LEARNING TO RETRIEVE, GENERATE, AND CRITIQUE THROUGH SELF-REFLECTION |
| Feb 13, 2024 | LLM AUGMENTED LLMS: EXPANDING CAPABILITIES THROUGH COMPOSITION |
| Feb 06, 2024 | Self-Rewarding Language Models |
| Jan 30, 2024 | Lion: Adversarial Distillation of Proprietary Large Language Models |
| Jan 23, 2024 | IN-CONTEXT PRETRAINING: LANGUAGE MODELING BEYOND DOCUMENT BOUNDARIES |
| Jan 16, 2024 | Mistral 7B & Mixtral (Mixtral of Experts) |
| Oct 31, 2023 | A Survey on Large Language Model based Autonomous Agents |
| Sep 12, 2023 | A Systematic Study of Knowledge Distillation for Natural Language Generation with Pseudo-Target Training |
| Aug 29, 2023 | Code Llama: Open Foundation Models for Code |
| Jun 29, 2023 | QLoRA: Eficient Finetuning of Quantized LLMs |
| Jun 15, 2023 | Do Prompt-Based Models Really Understand the Meaning of Their Prompts? |
| May 11, 2023 | Measuring Association Between Labels and Free-Text Rationales |
| Jan 12, 2023 | A Survey for In-context Learning |