Skip to content

Citation

Paper

The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail? Manish Bhatt, Sarthak Munshi, Vineeth Sai Narajala, Idan Habler, Ammar Al-Kahfah, Ken Huang, Joel Andrew Webb, and Blake Gatto. 2026.

BibTeX

bibtex
@article{bhatt2026trilemma,
  title   = {The Defense Trilemma: Why Prompt Injection Defense
             Wrappers Fail?},
  author  = {Bhatt, Manish and Munshi, Sarthak and
             Narajala, Vineeth Sai and Habler, Idan and
             Al-Kahfah, Ammar and Huang, Ken and Webb, Joel Andrew and
             Gatto, Blake},
  year    = {2026}
}

Companion artifact

The Lean 4 formalization is part of the same repository:

bibtex
@software{bhatt2026manifold_artifact,
  title  = {{ManifoldProofs}: Lean 4 formalization of the Defense
            Trilemma impossibility theorems},
  author = {Bhatt, Manish and Munshi, Sarthak and others},
  year   = {2026},
  note   = {46 files, ~360 theorems, 0 sorry, 3 standard axioms}
}
  • Tsipras et al., Robustness may be at odds with accuracy, ICLR 2019.
  • Fawzi et al., Adversarial vulnerability for any classifier, NeurIPS 2018.
  • Wolpert & Macready, No free lunch theorems for optimization, 1997.
  • Cohen et al., Certified adversarial robustness via randomized smoothing, ICML 2019.
  • Munshi et al., Manifold of failure: behavioral attraction basins in language models, 2026.
  • Anil et al., Many-shot jailbreaking, NeurIPS 2024.
  • Zhan et al., InjecAgent, ACL Findings 2024.
  • Hubinger et al., Sleeper agents, 2024.

The Defense Trilemma · mechanically verified in Lean 4 (46 files, ≈360 theorems, 0 sorry).