Citation

Paper

The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail? Manish Bhatt, Sarthak Munshi, Vineeth Sai Narajala, Idan Habler, Ammar Al-Kahfah, Ken Huang, Joel Andrew Webb, and Blake Gatto. 2026.

BibTeX

bibtex

@article{bhatt2026trilemma,
  title   = {The Defense Trilemma: Why Prompt Injection Defense
             Wrappers Fail?},
  author  = {Bhatt, Manish and Munshi, Sarthak and
             Narajala, Vineeth Sai and Habler, Idan and
             Al-Kahfah, Ammar and Huang, Ken and Webb, Joel Andrew and
             Gatto, Blake},
  year    = {2026}
}

Companion artifact

The Lean 4 formalization is part of the same repository:

bibtex

@software{bhatt2026manifold_artifact,
  title  = {{ManifoldProofs}: Lean 4 formalization of the Defense
            Trilemma impossibility theorems},
  author = {Bhatt, Manish and Munshi, Sarthak and others},
  year   = {2026},
  note   = {46 files, ~360 theorems, 0 sorry, 3 standard axioms}
}

Tsipras et al., Robustness may be at odds with accuracy, ICLR 2019.
Fawzi et al., Adversarial vulnerability for any classifier, NeurIPS 2018.
Wolpert & Macready, No free lunch theorems for optimization, 1997.
Cohen et al., Certified adversarial robustness via randomized smoothing, ICML 2019.
Munshi et al., Manifold of failure: behavioral attraction basins in language models, 2026.
Anil et al., Many-shot jailbreaking, NeurIPS 2024.
Zhan et al., InjecAgent, ACL Findings 2024.
Hubinger et al., Sleeper agents, 2024.

Citation ​

Paper ​

BibTeX ​

Companion artifact ​

Related work cited on this site ​

Citation

Paper

BibTeX

Companion artifact

Related work cited on this site