fine-tuning

an archive of posts with this tag

Aug 19, 2025 ON THE GENERALIZATION OF SFT: A REINFORCEMENT LEARNING PERSPECTIVE WITH REWARD RECTIFICATION
Aug 05, 2025 Impact of Fine-Tuning Methods on Memorization in Large Language Models
Jul 15, 2025 Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Jun 17, 2025 Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models
Jun 10, 2025 DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models
Apr 22, 2025 Fine-tuning Vision-Language-Action Models: Optimizing Speed and Success
Mar 25, 2025 ReFT: Reasoning with Reinforced Fine-Tuning
Mar 11, 2025 WHEN IS TASK VECTOR Provably EFFECTIVE FOR MODEL EDITING? A GENERALIZATION ANALYSIS OF NONLINEAR TRANSFORMERS
Mar 04, 2025 Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning
Jan 14, 2025 OpenVLA: An Open-Source Vision-Language-Action Model
Sep 23, 2024 Training Language Models to Self-Correct via Reinforcement Learning
Sep 23, 2024 SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories
Sep 02, 2024 Many-shot jailbreaking
Aug 20, 2024 Knowledge-Augmented Reasoning distillation for Small Language Models in Knowledge-Intensive Tasks (KARD)
Aug 13, 2024 Knowledge conflict survey
Jul 02, 2024 RL-JACK: Reinforcement Learning-powered Black-box Jailbreaking Attack against LLMs
Jun 11, 2024 Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
May 27, 2024 Understanding the performance gap between online and offline alignment algorithms
May 21, 2024 LLAMA PRO: Progressive LLaMA with Block Expansion
May 07, 2024 How to Train LLM? - From Data Parallel To Fully Sharded Data Parallel
Apr 30, 2024 Training diffusion modelse with reinforcement learning
Apr 30, 2024 Many-Shot In-Context Learning
Apr 23, 2024 ORPO: Monolithic Preference Optimization without Reference Model
Mar 19, 2024 Unveiling the Generalization Power of Fine-Tuned Large Language Models
Mar 12, 2024 A Simple and Effective Pruning Approach for Large Language Models
Feb 06, 2024 Self-Rewarding Language Models
Jan 30, 2024 Lion: Adversarial Distillation of Proprietary Large Language Models
Jan 09, 2024 Making Large Language Models A Better Foundation For Dense Retrieval
Oct 31, 2023 A Survey on Large Language Model based Autonomous Agents
Oct 10, 2023 LongLoRA: Efficient Fine-Tuning of Long-Context Large Language Models
Sep 12, 2023 A Systematic Study of Knowledge Distillation for Natural Language Generation with Pseudo-Target Training
Aug 29, 2023 Code Llama: Open Foundation Models for Code
Jun 15, 2023 Do Prompt-Based Models Really Understand the Meaning of Their Prompts?
Apr 20, 2023 FALSESUM : Generating Document-level NLI Examples for Recognizing Factual Inconsistency in Summarization
Apr 13, 2023 P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks
Mar 30, 2023 GPT Understands, Too
Feb 02, 2023 Measuring and Improving Semantic Diversity of Dialogue Generation
Jan 19, 2023 KALA: Knowledge-Augmented Language Model Adaptation
Jan 12, 2023 A Survey for In-context Learning