Aug 05, 2025 | Impact of Fine-Tuning Methods on Memorization in Large Language Models |
Aug 05, 2025 | BLOCK DIFFUSION: INTERPOLATING BETWEEN AUTOREGRESSIVE AND DIFFUSION LANGUAGE MODELS |
Jun 17, 2025 | Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models |
Jun 03, 2025 | Reinforcement Learning Finetunes Small Subnetworks in Large Language Models |
Apr 22, 2025 | Fine-tuning Vision-Language-Action Models: Optimizing Speed and Success |
Mar 11, 2025 | WHEN IS TASK VECTOR Provably EFFECTIVE FOR MODEL EDITING? A GENERALIZATION ANALYSIS OF NONLINEAR TRANSFORMERS |
Mar 04, 2025 | Contextual Document Embeddings |
Feb 18, 2025 | DeepSeek v3 |
Feb 04, 2025 | Titans: Learning to Memorize at Test Time |
Feb 04, 2025 | SSM → HIPPO → LSSL → S4 → Mamba → Mamba2 |
Jan 21, 2025 | Agent Laboratory: Using LLM Agents as Research Assistants |
Jan 14, 2025 | OpenVLA: An Open-Source Vision-Language-Action Model |
Jan 02, 2025 | TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies |
Oct 17, 2024 | KNOWLEDGE ENTROPY DECAY DURING LANGUAGE MODEL PRETRAINING HINDERS NEW KNOWLEDGE ACQUISITION |
Sep 23, 2024 | SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories |
Aug 13, 2024 | Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process |
Aug 13, 2024 | Knowledge conflict survey |
Jun 11, 2024 | Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet |
Jun 04, 2024 | Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training |
May 21, 2024 | LLAMA PRO: Progressive LLaMA with Block Expansion |
May 07, 2024 | How to Train LLM? - From Data Parallel To Fully Sharded Data Parallel |
May 07, 2024 | How to Inference Big LLM? - Using Accelerate Library |
Mar 11, 2024 | BitNet: Scaling 1-bit Transformers for Large Language Models |
Jan 16, 2024 | Mistral 7B & Mixtral (Mixtral of Experts) |
Jan 03, 2024 | vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention |
Dec 19, 2023 | Learning to Tokenize for Generative Retrieval |
Dec 19, 2023 | Break the Sequential Dependency of LLM Inference Using Lookahead Decoding |
Oct 31, 2023 | A Survey on Large Language Model based Autonomous Agents |
Oct 10, 2023 | LongLoRA: Efficient Fine-Tuning of Long-Context Large Language Models |
Oct 03, 2023 | DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models |
Sep 19, 2023 | The CRINGE Loss: Learning what language not to model |
Jun 29, 2023 | QLoRA: Eficient Finetuning of Quantized LLMs |
Apr 13, 2023 | AdapterDrop: On the Efficiency of Adapters in Transformers |
Mar 16, 2023 | Calibrating Factual Knowledge in Pretrained Language Models |
Feb 09, 2023 | AdapterHub: A Framework for Adapting Transformers, Parameter-Efficient Transfer Learning for NLP |
Jan 19, 2023 | KALA: Knowledge-Augmented Language Model Adaptation |
Jan 12, 2023 | A Survey for In-context Learning |