Aug 12, 2025 ON THE EXPRESSIVENESS OF SOFTMAX ATTENTION: A RECURRENT NEURAL NETWORK PERSPECTIVE Mar 04, 2025 Contextual Document Embeddings Feb 04, 2025 Titans: Learning to Memorize at Test Time Feb 04, 2025 SSM → HIPPO → LSSL → S4 → Mamba → Mamba2 Oct 17, 2024 KNOWLEDGE ENTROPY DECAY DURING LANGUAGE MODEL PRETRAINING HINDERS NEW KNOWLEDGE ACQUISITION Aug 20, 2024 Knowledge-Augmented Reasoning distillation for Small Language Models in Knowledge-Intensive Tasks (KARD) Jul 02, 2024 RL-JACK: Reinforcement Learning-powered Black-box Jailbreaking Attack against LLMs Jul 02, 2024 Llama3 Tokenizer Apr 30, 2024 Training diffusion modelse with reinforcement learning Apr 13, 2024 Scaling Laws for Data Filtering— Data Curation cannot be Compute Agnostic Dec 26, 2023 Are Emergent Abilities of Large Language Models a Mirage? Sep 19, 2023 The CRINGE Loss: Learning what language not to model Jun 29, 2023 QLoRA: Eficient Finetuning of Quantized LLMs Jan 26, 2023 Task-aware Retrieval with Instructions