Aug 19, 2025 Spurious Rewards: Rethinking Training Signals in RLVR Jun 24, 2025 See What You Are Told: Visual Attention Sink in Large Multimodal Models Jun 17, 2025 Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models Jun 10, 2025 Towards a Generalizable Bimanual Foundation Policy via Flow-based Video Prediction Apr 22, 2025 Fine-tuning Vision-Language-Action Models: Optimizing Speed and Success Mar 11, 2025 Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs Jan 14, 2025 OpenVLA: An Open-Source Vision-Language-Action Model Jan 02, 2025 TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies Oct 03, 2024 QCRD: Quality-guided Contrastive Rationale Distillation for Large Lanauge Models Sep 23, 2024 Training Language Models to Self-Correct via Reinforcement Learning Sep 09, 2024 Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models Jul 23, 2024 Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning Apr 30, 2024 Training diffusion modelse with reinforcement learning Apr 13, 2024 Scaling Laws for Data Filtering— Data Curation cannot be Compute Agnostic Dec 26, 2023 Are Emergent Abilities of Large Language Models a Mirage? Oct 17, 2023 Resolving Interference When Merging Models Jan 19, 2023 KALA: Knowledge-Augmented Language Model Adaptation