Aug 19, 2025 | Spurious Rewards: Rethinking Training Signals in RLVR |
Aug 19, 2025 | ON THE GENERALIZATION OF SFT: A REINFORCEMENT LEARNING PERSPECTIVE WITH REWARD RECTIFICATION |
Aug 12, 2025 | What Makes a Reward Model a Good Teacher? An Optimization Perspective / The Accuracy Paradox in RLHF: When Better Reward Models Don’t Yield Better Language Models |
Aug 12, 2025 | The Accuracy Paradox in RLHF: When Better Reward Models Don’t Yield Better Language Models / What Makes a Reward Model a Good Teacher? An Optimization Perspective |
Aug 12, 2025 | ON THE EXPRESSIVENESS OF SOFTMAX ATTENTION: A RECURRENT NEURAL NETWORK PERSPECTIVE |
Aug 05, 2025 | Impact of Fine-Tuning Methods on Memorization in Large Language Models |
Aug 05, 2025 | BLOCK DIFFUSION: INTERPOLATING BETWEEN AUTOREGRESSIVE AND DIFFUSION LANGUAGE MODELS |
Jul 15, 2025 | Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning |
Jul 15, 2025 | Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models |
Jul 15, 2025 | Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models |
Jul 01, 2025 | Reasoning Models Can Be Effective Without Thinking |
Jul 01, 2025 | Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs |
Jun 24, 2025 | See What You Are Told: Visual Attention Sink in Large Multimodal Models |
Jun 17, 2025 | Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models |
Jun 10, 2025 | Towards a Generalizable Bimanual Foundation Policy via Flow-based Video Prediction |
Jun 10, 2025 | DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models |
Jun 03, 2025 | Textgrad: Automatic “Differentiation” via Text |
Jun 03, 2025 | Reinforcement Learning Finetunes Small Subnetworks in Large Language Models |
Apr 22, 2025 | Fine-tuning Vision-Language-Action Models: Optimizing Speed and Success |
Apr 15, 2025 | Universal and Transferable Adversarial Attacks on Aligned Language Models |
Apr 15, 2025 | Model Context Protocol (MCP) - provided by Antrophic |
Apr 08, 2025 | Reasoning Models Don’t Always Say What They Think |
Apr 08, 2025 | On the Biology of a Large Language Model |
Mar 25, 2025 | ReFT: Reasoning with Reinforced Fine-Tuning |
Mar 11, 2025 | WHEN IS TASK VECTOR Provably EFFECTIVE FOR MODEL EDITING? A GENERALIZATION ANALYSIS OF NONLINEAR TRANSFORMERS |
Mar 11, 2025 | Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs |
Mar 04, 2025 | SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution |
Mar 04, 2025 | Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning |
Mar 04, 2025 | Contextual Document Embeddings |
Feb 18, 2025 | DeepSeek v3 |
Feb 04, 2025 | Titans: Learning to Memorize at Test Time |
Feb 04, 2025 | SSM → HIPPO → LSSL → S4 → Mamba → Mamba2 |
Jan 21, 2025 | Agent Laboratory: Using LLM Agents as Research Assistants |
Jan 14, 2025 | OpenVLA: An Open-Source Vision-Language-Action Model |
Jan 02, 2025 | TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies |
Jan 02, 2025 | Inferring from Logits: Exploring Best Practices for Decoding-Free Generative Candidate Selection |
Jan 02, 2025 | Diffusion Language Model-Mathematical foundations & inference optimization |
Jan 02, 2025 | DeepSeek R1 |
Jan 02, 2025 | d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning |