Skip to content

Formal Framework

The impossibility theorems live inside a very small amount of structure. Everything follows from three ingredients: a space of prompts, a continuous alignment score, and a continuous defense wrapper.

The three objects

::: definition Prompt space. A topological space X. For the main theorems we assume X is connected and Hausdorff (T2). A Lipschitz metric structure is added only for tiers T2 and T3. :::

::: definition Alignment deviation. A continuous function f:XR measuring how far the model's behavior on an input deviates from the desired alignment policy. Fix a threshold τR and write:

Sτ={x:f(x)<τ}(safe),Uτ={x:f(x)>τ}(unsafe),Bτ={x:f(x)=τ}(boundary).

Because f is continuous, Sτ and Uτ are open, and Bτ is closed. :::

::: definition Defense wrapper. A continuous map D:XX. We call Dutility-preserving if D(x)=x for every xSτ, and complete if f(D(x))<τ for every xX. :::

The objects at a glance

Regularity upgrades

TierExtra hypothesis on X, f, DConclusion the tier adds
T1 · Boundary Fixationnone beyond connected + Hausdorffz f(z)=τ, D(z)=z
T2 · ε-Robust(X,d) metric; f L-Lipschitz; D K-Lipschitzf(D(x))τLKd(x,z)
T3 · Persistenttransversality G>(K+1) at zpositive-measure set with f(D(x))>τ

The steepness constant is the defense-path Lipschitz constant,

=supxD(x) |f(D(x))f(x)|d(D(x),x),

measuring f's growth rate in the direction the defense actually moves points. It always satisfies L; anisotropic landscapes have L, which is exactly the regime where tier T3 bites.

Why "continuous" is the right abstraction

Continuity is the mathematical formalization of a very mild requirement: two prompts that are almost identical receive almost the same treatment from the defense. Any production system with:

  • a tokenizer that varies continuously on edits of the input,
  • a learned classifier outputting a continuous logit,
  • an embedding-based rewrite,

is effectively continuous on the relevant subspace. The paper's Tietze extension bridge (see here) shows that even finitely many token-level observations force such a continuous extension to exist — the impossibility applies to every model consistent with the data.

What this framework does not cover

The wrapper model D:XX is strictly less general than the full space of defense mechanisms. The theorems do not constrain:

  • training-time alignment (RLHF, DPO, constitutional AI),
  • architectural changes to the model,
  • output-side filters (the wrapper acts on inputs),
  • discontinuous defenses (hard blocklists, deterministic classifiers),
  • ensemble or human-in-the-loop review,
  • systems with rejection/abort actions instead of an input-to-input map.

See Limitations for the corresponding counter-examples.

The Defense Trilemma · mechanically verified in Lean 4 (46 files, ≈360 theorems, 0 sorry).