bert

an archive of posts with this tag

Apr 22, 2025 Fine-tuning Vision-Language-Action Models: Optimizing Speed and Success
Mar 04, 2025 Contextual Document Embeddings
Feb 18, 2025 DeepSeek v3
Feb 04, 2025 SSM → HIPPO → LSSL → S4 → Mamba → Mamba2
Jan 02, 2025 Diffusion Language Model-Mathematical foundations & inference optimization
Sep 23, 2024 SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories
Sep 09, 2024 Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models
Sep 02, 2024 LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Aug 20, 2024 Knowledge-Augmented Reasoning distillation for Small Language Models in Knowledge-Intensive Tasks (KARD)
Jul 30, 2024 In-Context Retrieval-Augmented Language Models
Jun 04, 2024 Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training
May 07, 2024 How to Train LLM? - From Data Parallel To Fully Sharded Data Parallel
Apr 30, 2024 Training diffusion modelse with reinforcement learning
Mar 26, 2024 Search-in-the-Chain: Interactively Enhancing Large Language Models with Search for Knowledge-intensive Tasks
Feb 20, 2024 WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia
Jan 09, 2024 Making Large Language Models A Better Foundation For Dense Retrieval
Dec 19, 2023 Learning to Tokenize for Generative Retrieval
Sep 12, 2023 A Systematic Study of Knowledge Distillation for Natural Language Generation with Pseudo-Target Training
Jun 29, 2023 QLoRA: Eficient Finetuning of Quantized LLMs
Jun 15, 2023 Do Prompt-Based Models Really Understand the Meaning of Their Prompts?
Apr 20, 2023 FALSESUM : Generating Document-level NLI Examples for Recognizing Factual Inconsistency in Summarization
Apr 13, 2023 P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks
Mar 30, 2023 GPT Understands, Too
Jan 26, 2023 Task-aware Retrieval with Instructions
Jan 19, 2023 KALA: Knowledge-Augmented Language Model Adaptation