alignment | Unknown NLP Lab

Oct 17, 2025	SimPO: Simple Preference Optimization with a Reference-Free Reward
Aug 19, 2025	ON THE GENERALIZATION OF SFT: A REINFORCEMENT LEARNING PERSPECTIVE WITH REWARD RECTIFICATION
Aug 12, 2025	The Accuracy Paradox in RLHF: When Better Reward Models Don’t Yield Better Language Models / What Makes a Reward Model a Good Teacher? An Optimization Perspective
Jun 10, 2025	Towards a Generalizable Bimanual Foundation Policy via Flow-based Video Prediction
Apr 15, 2025	Universal and Transferable Adversarial Attacks on Aligned Language Models
Apr 08, 2025	Reasoning Models Don’t Always Say What They Think
Mar 25, 2025	ReFT: Reasoning with Reinforced Fine-Tuning
Mar 04, 2025	Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning
Sep 23, 2024	Training Language Models to Self-Correct via Reinforcement Learning
Sep 09, 2024	Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models
Sep 02, 2024	Many-shot jailbreaking
Aug 13, 2024	Knowledge conflict survey
Jul 23, 2024	Step-DPO : Step-wise preference optimization for long-chain reasoning of LLMs
Jul 02, 2024	RL-JACK: Reinforcement Learning-powered Black-box Jailbreaking Attack against LLMs
Jun 11, 2024	Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
May 28, 2024	SimPO: Simple Preference Optimization with a Reference-Free Reward
May 27, 2024	Understanding the performance gap between online and offline alignment algorithms
May 21, 2024	LLAMA PRO: Progressive LLaMA with Block Expansion
Apr 30, 2024	Training diffusion modelse with reinforcement learning
Apr 23, 2024	ORPO: Monolithic Preference Optimization without Reference Model
Apr 02, 2024	Preference-free Alignment Learning with Regularized Relevance Reward
Mar 05, 2024	Beyond Memorization: Violating Privacy Via Inferencing With LLMs
Feb 06, 2024	Self-Rewarding Language Models
Jan 30, 2024	Lion: Adversarial Distillation of Proprietary Large Language Models
Jan 16, 2024	BENCHMARKING COGNITIVE BIASES IN LARGE LANGUAGE MODELS AS EVALUATORS
Oct 31, 2023	In-Context Learning Learns Label Relationships but Is Not Conventional Learning
Oct 31, 2023	A Survey on Large Language Model based Autonomous Agents
Sep 12, 2023	A Systematic Study of Knowledge Distillation for Natural Language Generation with Pseudo-Target Training