reasoning

an archive of posts with this tag

Aug 19, 2025 Spurious Rewards: Rethinking Training Signals in RLVR
Jul 15, 2025 Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Jul 15, 2025 Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models
Jul 15, 2025 Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models
Jul 01, 2025 Reasoning Models Can Be Effective Without Thinking
Jul 01, 2025 Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs
Jun 24, 2025 See What You Are Told: Visual Attention Sink in Large Multimodal Models
Jun 17, 2025 Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models
Jun 10, 2025 DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models
Jun 03, 2025 Textgrad: Automatic “Differentiation” via Text
Apr 22, 2025 Fine-tuning Vision-Language-Action Models: Optimizing Speed and Success
Apr 08, 2025 Reasoning Models Don’t Always Say What They Think
Apr 08, 2025 On the Biology of a Large Language Model
Mar 25, 2025 ReFT: Reasoning with Reinforced Fine-Tuning
Mar 11, 2025 Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs
Mar 04, 2025 SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Mar 04, 2025 Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning
Feb 18, 2025 DeepSeek v3
Jan 02, 2025 Inferring from Logits: Exploring Best Practices for Decoding-Free Generative Candidate Selection
Jan 02, 2025 DeepSeek R1
Jan 02, 2025 d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
Oct 10, 2024 FAITHEVAL: CAN YOUR LANGUAGE MODEL STAY FAITHFUL TO CONTEXT, EVEN IF “THE MOON IS MADE OF MARSHMALLOWS”
Oct 03, 2024 QCRD: Quality-guided Contrastive Rationale Distillation for Large Lanauge Models
Sep 23, 2024 SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories
Sep 09, 2024 Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
Aug 20, 2024 Knowledge-Augmented Reasoning distillation for Small Language Models in Knowledge-Intensive Tasks (KARD)
Aug 13, 2024 Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
Jul 23, 2024 Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
Jul 23, 2024 Step-DPO : Step-wise preference optimization for long-chain reasoning of LLMs
May 21, 2024 LLAMA PRO: Progressive LLaMA with Block Expansion
Apr 30, 2024 Many-Shot In-Context Learning
Apr 23, 2024 ORPO: Monolithic Preference Optimization without Reference Model
Apr 23, 2024 Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?
Mar 26, 2024 Search-in-the-Chain: Interactively Enhancing Large Language Models with Search for Knowledge-intensive Tasks
Mar 05, 2024 Beyond Memorization: Violating Privacy Via Inferencing With LLMs
Feb 27, 2024 SELF-RAG: LEARNING TO RETRIEVE, GENERATE, AND CRITIQUE THROUGH SELF-REFLECTION
Feb 13, 2024 LLM AUGMENTED LLMS: EXPANDING CAPABILITIES THROUGH COMPOSITION
Feb 06, 2024 Self-Rewarding Language Models
Jan 30, 2024 Lion: Adversarial Distillation of Proprietary Large Language Models
Jan 23, 2024 IN-CONTEXT PRETRAINING: LANGUAGE MODELING BEYOND DOCUMENT BOUNDARIES
Jan 16, 2024 Mistral 7B & Mixtral (Mixtral of Experts)
Oct 31, 2023 A Survey on Large Language Model based Autonomous Agents
Sep 12, 2023 A Systematic Study of Knowledge Distillation for Natural Language Generation with Pseudo-Target Training
Aug 29, 2023 Code Llama: Open Foundation Models for Code
Jun 29, 2023 QLoRA: Eficient Finetuning of Quantized LLMs
Jun 15, 2023 Do Prompt-Based Models Really Understand the Meaning of Their Prompts?
May 11, 2023 Measuring Association Between Labels and Free-Text Rationales
Jan 12, 2023 A Survey for In-context Learning