Best-of-N Jail an algorithm that jailbreaks LLM across modalities
(jplhughes.github.io)
We introduce Best-of-N (BoN) Jailbreaking, a simple black-box algorithm that jailbreaks frontier AI systems across modalities.
We introduce Best-of-N (BoN) Jailbreaking, a simple black-box algorithm that jailbreaks frontier AI systems across modalities.