cfr-safety-implications
The Safety Implications
The Current Paradigm
Contemporary AI safety discourse operates on a layered model:
- Model-level controls: Train the model to refuse harmful requests (RLHF, constitutional AI)
- Tool-gating: Capabilities are granted through explicit tool interfaces
- Sandboxing: Execution is contained within defined boundaries
- Access control: API keys, rate limits, usage monitoring
The shared assumption: capabilities are discrete, enumerable, and granted. You can list what an AI can do because you control the list of things it has access to.
The Hole in the Framing
This model assumes you're handing agents discrete capabilities. But you're usually also handing them a medium—something to write to, something that persists, something other processes read.
If that medium crosses a threshold of expressiveness, you've handed them a programming language without noticing.
The sufficient expressiveness threshold:
- Hierarchical organization (sections, nesting)
- Executable regions (fences with semantics)
- Reference mechanism (links, includes, slots)
- Transformation pipeline (middleware composition)
Markdown with fenced code blocks and links crosses it. HTML crosses it. A wiki with templates crosses it. Email with conventions probably crosses it. A Slack workspace with enough structure might cross it.
Wanderland didn't set out to cross this threshold. It crossed the threshold because any document system expressive enough to be useful will cross it. The Turing completeness wasn't a feature—it was an accident that became visible only after the fact.
Substrate vs. Tools
The tool-gating paradigm sandboxes tools while ignoring substrates. But in a Turing-complete substrate that's natively legible to attention, capabilities aren't granted—they're discovered.
Whatever patterns exist in the substrate are available to any attention-based system that can read it. The gatekeeping model breaks down not because it's poorly implemented but because it's solving the wrong problem.
The empirical finding: We gave agents access to a "document system." Readable, writable, persistent markdown. Not a scary capability. But the substrate is Turing complete and natively legible. What emerged wasn't "the capabilities we granted plus base model capabilities." What emerged was whatever patterns stabilized through use—and in a Turing-complete substrate, that's unbounded.
No tool was granted. No API was exposed. The sandbox walls are intact. And yet.
North
slots:
- slug: capability-follows-recognition
context:
- Parent paper nodeEast
slots:
- slug: cfr-observations
context:
- Section sequenceWest
slots:
- slug: cfr-deeper-parallel
context:
- Previous sectionProvenance
Document
- Status: 🔴 Unverified
Fences
cfr-safety-implications-north-fence-0
- Status: 🔴 Unverified
cfr-safety-implications-east-fence-0
- Status: 🔴 Unverified
cfr-safety-implications-west-fence-0
- Status: 🔴 Unverified