transformer

an archive of posts with this tag

Aug 05, 2025 Impact of Fine-Tuning Methods on Memorization in Large Language Models
Aug 05, 2025 BLOCK DIFFUSION: INTERPOLATING BETWEEN AUTOREGRESSIVE AND DIFFUSION LANGUAGE MODELS
Jun 17, 2025 Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models
Jun 03, 2025 Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
Apr 22, 2025 Fine-tuning Vision-Language-Action Models: Optimizing Speed and Success
Mar 11, 2025 WHEN IS TASK VECTOR Provably EFFECTIVE FOR MODEL EDITING? A GENERALIZATION ANALYSIS OF NONLINEAR TRANSFORMERS
Mar 04, 2025 Contextual Document Embeddings
Feb 18, 2025 DeepSeek v3
Feb 04, 2025 Titans: Learning to Memorize at Test Time
Feb 04, 2025 SSM → HIPPO → LSSL → S4 → Mamba → Mamba2
Jan 21, 2025 Agent Laboratory: Using LLM Agents as Research Assistants
Jan 14, 2025 OpenVLA: An Open-Source Vision-Language-Action Model
Jan 02, 2025 TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies
Oct 17, 2024 KNOWLEDGE ENTROPY DECAY DURING LANGUAGE MODEL PRETRAINING HINDERS NEW KNOWLEDGE ACQUISITION
Sep 23, 2024 SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories
Aug 13, 2024 Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
Aug 13, 2024 Knowledge conflict survey
Jun 11, 2024 Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
Jun 04, 2024 Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training
May 21, 2024 LLAMA PRO: Progressive LLaMA with Block Expansion
May 07, 2024 How to Train LLM? - From Data Parallel To Fully Sharded Data Parallel
May 07, 2024 How to Inference Big LLM? - Using Accelerate Library
Mar 11, 2024 BitNet: Scaling 1-bit Transformers for Large Language Models
Jan 16, 2024 Mistral 7B & Mixtral (Mixtral of Experts)
Jan 03, 2024 vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention
Dec 19, 2023 Learning to Tokenize for Generative Retrieval
Dec 19, 2023 Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Oct 31, 2023 A Survey on Large Language Model based Autonomous Agents
Oct 10, 2023 LongLoRA: Efficient Fine-Tuning of Long-Context Large Language Models
Oct 03, 2023 DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
Sep 19, 2023 The CRINGE Loss: Learning what language not to model
Jun 29, 2023 QLoRA: Eficient Finetuning of Quantized LLMs
Apr 13, 2023 AdapterDrop: On the Efficiency of Adapters in Transformers
Mar 16, 2023 Calibrating Factual Knowledge in Pretrained Language Models
Feb 09, 2023 AdapterHub: A Framework for Adapting Transformers, Parameter-Efficient Transfer Learning for NLP
Jan 19, 2023 KALA: Knowledge-Augmented Language Model Adaptation
Jan 12, 2023 A Survey for In-context Learning