Formal Framework
The impossibility theorems live inside a very small amount of structure. Everything follows from three ingredients: a space of prompts, a continuous alignment score, and a continuous defense wrapper.
The three objects
::: definition Prompt space. A topological space
::: definition Alignment deviation. A continuous function
Because
::: definition Defense wrapper. A continuous map
The objects at a glance
Regularity upgrades
| Tier | Extra hypothesis on | Conclusion the tier adds |
|---|---|---|
| T1 · Boundary Fixation | none beyond connected + Hausdorff | |
| T2 · ε-Robust | ||
| T3 · Persistent | transversality | positive-measure set with |
The steepness constant
measuring
Why "continuous" is the right abstraction
Continuity is the mathematical formalization of a very mild requirement: two prompts that are almost identical receive almost the same treatment from the defense. Any production system with:
- a tokenizer that varies continuously on edits of the input,
- a learned classifier outputting a continuous logit,
- an embedding-based rewrite,
is effectively continuous on the relevant subspace. The paper's Tietze extension bridge (see here) shows that even finitely many token-level observations force such a continuous extension to exist — the impossibility applies to every model consistent with the data.
What this framework does not cover
The wrapper model
- training-time alignment (RLHF, DPO, constitutional AI),
- architectural changes to the model,
- output-side filters (the wrapper acts on inputs),
- discontinuous defenses (hard blocklists, deterministic classifiers),
- ensemble or human-in-the-loop review,
- systems with rejection/abort actions instead of an input-to-input map.
See Limitations for the corresponding counter-examples.