Apr 08, 2025 On the Biology of a Large Language Model Mar 11, 2025 WHEN IS TASK VECTOR Provably EFFECTIVE FOR MODEL EDITING? A GENERALIZATION ANALYSIS OF NONLINEAR TRANSFORMERS Jan 02, 2025 Inferring from Logits: Exploring Best Practices for Decoding-Free Generative Candidate Selection Oct 17, 2024 Rule Based Rewards for Language Model Safety Sep 02, 2024 LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders May 27, 2024 Understanding the performance gap between online and offline alignment algorithms May 07, 2024 How to Train LLM? - From Data Parallel To Fully Sharded Data Parallel Apr 30, 2024 Many-Shot In-Context Learning Apr 23, 2024 Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers? Apr 13, 2024 Scaling Laws for Data Filtering— Data Curation cannot be Compute Agnostic Apr 02, 2024 Preference-free Alignment Learning with Regularized Relevance Reward Mar 19, 2024 Unveiling the Generalization Power of Fine-Tuned Large Language Models Mar 12, 2024 A Simple and Effective Pruning Approach for Large Language Models Jan 30, 2024 Lion: Adversarial Distillation of Proprietary Large Language Models Jan 23, 2024 OVERTHINKING THE TRUTH: UNDERSTANDING HOW LANGUAGE MODELS PROCESS FALSE DEMONSTRATIONS Jan 23, 2024 IN-CONTEXT PRETRAINING: LANGUAGE MODELING BEYOND DOCUMENT BOUNDARIES Jan 16, 2024 Mistral 7B & Mixtral (Mixtral of Experts) Jan 09, 2024 Making Large Language Models A Better Foundation For Dense Retrieval Dec 12, 2023 Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning Sep 19, 2023 The CRINGE Loss: Learning what language not to model May 25, 2023 Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? Apr 20, 2023 FALSESUM : Generating Document-level NLI Examples for Recognizing Factual Inconsistency in Summarization Apr 13, 2023 P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks