Citation
Paper
The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail? Manish Bhatt, Sarthak Munshi, Vineeth Sai Narajala, Idan Habler, Ammar Al-Kahfah, Ken Huang, Joel Andrew Webb, and Blake Gatto. 2026.
BibTeX
bibtex
@article{bhatt2026trilemma,
title = {The Defense Trilemma: Why Prompt Injection Defense
Wrappers Fail?},
author = {Bhatt, Manish and Munshi, Sarthak and
Narajala, Vineeth Sai and Habler, Idan and
Al-Kahfah, Ammar and Huang, Ken and Webb, Joel Andrew and
Gatto, Blake},
year = {2026}
}Companion artifact
The Lean 4 formalization is part of the same repository:
bibtex
@software{bhatt2026manifold_artifact,
title = {{ManifoldProofs}: Lean 4 formalization of the Defense
Trilemma impossibility theorems},
author = {Bhatt, Manish and Munshi, Sarthak and others},
year = {2026},
note = {46 files, ~360 theorems, 0 sorry, 3 standard axioms}
}Related work cited on this site
- Tsipras et al., Robustness may be at odds with accuracy, ICLR 2019.
- Fawzi et al., Adversarial vulnerability for any classifier, NeurIPS 2018.
- Wolpert & Macready, No free lunch theorems for optimization, 1997.
- Cohen et al., Certified adversarial robustness via randomized smoothing, ICML 2019.
- Munshi et al., Manifold of failure: behavioral attraction basins in language models, 2026.
- Anil et al., Many-shot jailbreaking, NeurIPS 2024.
- Zhan et al., InjecAgent, ACL Findings 2024.
- Hubinger et al., Sleeper agents, 2024.