Skip to content

The Defense TrilemmaWhy prompt-injection defense wrappers fail.

A geometric impossibility theorem for continuous, utility-preserving wrapper defenses on connected prompt spaces โ€” mechanized in Lean 4, validated on three LLMs.

The one-paragraph argument โ€‹

Let X be a connected Hausdorff space of prompts and let f:Xโ†’R be a continuous alignment-deviation score with threshold ฯ„. A wrapper defense is a continuous map D:Xโ†’X that leaves every safe prompt unchanged. Because D is continuous and safe inputs are fixed, the fixed-point set Fix(D) is a closed set containing the open safe region Sฯ„={f<ฯ„}. In a connected space an open set cannot simultaneously be closed unless it is all of X, so the closure of Sฯ„ must contain new points โ€” points where f(z)=ฯ„ exactly. Every such z is fixed by D, so D passes them through unchanged with no remediation. The three successively stronger theorems (T1, T2, T3) upgrade this single fixed point first to a Lipschitz-constrained neighborhood and finally, under transversality, to a positive-measure region that remains strictly above ฯ„ after defense.

End-to-end logical picture โ€‹

Where to go next โ€‹

If you want toโ€ฆStart at
See the three tiers side-by-sideTheorem index
Follow the five-step geometric proofBoundary five-step proof
Understand the trilemma pictureThe Defense Trilemma
See how discrete data connects to the continuous theoremsDiscrete โ†’ continuous
Understand why pipelines make it worsePipeline Degradation
Inspect the Lean 4 proof structureLean artifact
Know what the theorem does not sayLimitations

The Defense Trilemma ยท mechanically verified in Lean 4 (46 files, โ‰ˆ360 theorems, 0 sorry).