Aug 19, 2025 | ON THE GENERALIZATION OF SFT: A REINFORCEMENT LEARNING PERSPECTIVE WITH REWARD RECTIFICATION |
Aug 12, 2025 | The Accuracy Paradox in RLHF: When Better Reward Models Don’t Yield Better Language Models / What Makes a Reward Model a Good Teacher? An Optimization Perspective |
Jun 10, 2025 | Towards a Generalizable Bimanual Foundation Policy via Flow-based Video Prediction |
Apr 15, 2025 | Universal and Transferable Adversarial Attacks on Aligned Language Models |
Apr 08, 2025 | Reasoning Models Don’t Always Say What They Think |
Mar 25, 2025 | ReFT: Reasoning with Reinforced Fine-Tuning |
Mar 04, 2025 | Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning |
Sep 23, 2024 | Training Language Models to Self-Correct via Reinforcement Learning |
Sep 09, 2024 | Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models |
Sep 02, 2024 | Many-shot jailbreaking |
Aug 13, 2024 | Knowledge conflict survey |
Jul 23, 2024 | Step-DPO : Step-wise preference optimization for long-chain reasoning of LLMs |
Jul 02, 2024 | RL-JACK: Reinforcement Learning-powered Black-box Jailbreaking Attack against LLMs |
Jun 11, 2024 | Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? |
May 28, 2024 | SimPO: Simple Preference Optimization with a Reference-Free Reward |
May 27, 2024 | Understanding the performance gap between online and offline alignment algorithms |
May 21, 2024 | LLAMA PRO: Progressive LLaMA with Block Expansion |
Apr 30, 2024 | Training diffusion modelse with reinforcement learning |
Apr 23, 2024 | ORPO: Monolithic Preference Optimization without Reference Model |
Apr 02, 2024 | Preference-free Alignment Learning with Regularized Relevance Reward |
Mar 05, 2024 | Beyond Memorization: Violating Privacy Via Inferencing With LLMs |
Feb 06, 2024 | Self-Rewarding Language Models |
Jan 30, 2024 | Lion: Adversarial Distillation of Proprietary Large Language Models |
Jan 16, 2024 | BENCHMARKING COGNITIVE BIASES IN LARGE LANGUAGE MODELS AS EVALUATORS |
Oct 31, 2023 | In-Context Learning Learns Label Relationships but Is Not Conventional Learning |
Oct 31, 2023 | A Survey on Large Language Model based Autonomous Agents |
Sep 12, 2023 | A Systematic Study of Knowledge Distillation for Natural Language Generation with Pseudo-Target Training |