Apr 22, 2025 | Fine-tuning Vision-Language-Action Models: Optimizing Speed and Success |
Mar 04, 2025 | Contextual Document Embeddings |
Feb 18, 2025 | DeepSeek v3 |
Feb 04, 2025 | SSM → HIPPO → LSSL → S4 → Mamba → Mamba2 |
Jan 02, 2025 | Diffusion Language Model-Mathematical foundations & inference optimization |
Sep 23, 2024 | SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories |
Sep 09, 2024 | Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models |
Sep 02, 2024 | LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders |
Aug 20, 2024 | Knowledge-Augmented Reasoning distillation for Small Language Models in Knowledge-Intensive Tasks (KARD) |
Jul 30, 2024 | In-Context Retrieval-Augmented Language Models |
Jun 04, 2024 | Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training |
May 07, 2024 | How to Train LLM? - From Data Parallel To Fully Sharded Data Parallel |
Apr 30, 2024 | Training diffusion modelse with reinforcement learning |
Mar 26, 2024 | Search-in-the-Chain: Interactively Enhancing Large Language Models with Search for Knowledge-intensive Tasks |
Feb 20, 2024 | WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia |
Jan 09, 2024 | Making Large Language Models A Better Foundation For Dense Retrieval |
Dec 19, 2023 | Learning to Tokenize for Generative Retrieval |
Sep 12, 2023 | A Systematic Study of Knowledge Distillation for Natural Language Generation with Pseudo-Target Training |
Jun 29, 2023 | QLoRA: Eficient Finetuning of Quantized LLMs |
Jun 15, 2023 | Do Prompt-Based Models Really Understand the Meaning of Their Prompts? |
Apr 20, 2023 | FALSESUM : Generating Document-level NLI Examples for Recognizing Factual Inconsistency in Summarization |
Apr 13, 2023 | P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks |
Mar 30, 2023 | GPT Understands, Too |
Jan 26, 2023 | Task-aware Retrieval with Instructions |
Jan 19, 2023 | KALA: Knowledge-Augmented Language Model Adaptation |