Oct 10, 2024 FAITHEVAL: CAN YOUR LANGUAGE MODEL STAY FAITHFUL TO CONTEXT, EVEN IF “THE MOON IS MADE OF MARSHMALLOWS” Sep 09, 2024 Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models Sep 02, 2024 Many-shot jailbreaking Jul 02, 2024 RL-JACK: Reinforcement Learning-powered Black-box Jailbreaking Attack against LLMs