rag-as-native-attention
RAG as Native Attention
Wanderland is RAG where the retriever is the substrate itself, not a bolt-on step.
Core Claim
| Conventional RAG | Wanderland |
|---|---|
| Retrieval is an external stage | Retrieval is the attention mechanism |
| Fetches documents, feeds to LLM | Fences, pages, graph queries ARE Q over persistent K/V |
| Attention only runs inside the model | Attention runs over the entire corpus |
Attention Mapping
K (Keys)
- Fence identities (slug:fence)
- Tags, links, metadata
- Schema descriptors
- These define where patterns live and how they can be matched
Q (Queries)
| Query Type | Attention Equivalent |
|---|---|
peek(slug:fence) |
Single-head local attention |
peek(slug) (all fences on page) |
Multi-head attention within local region |
query(pattern/tags/graph-walk) |
Global attention over entire DAG |
V (Values)
| Level | Content | Space |
|---|---|---|
| L3 | Code/fence definitions | Tool body |
| L4 | Computed data results | Materialized outputs |
| L5 | Rendered documents | Middleware views |
The Operation
Every time you:
- Address a fence, page, or subgraph β you're emitting Q
- Match against graph structure, indices, tags β that's K
- Retrieve cached results at various levels β that's V
Already stored. Provenance-tagged. Re-usable.
Layered RAG as Layered Attention
GraphRAG papers argue that RAG over knowledge graphs gives better structure and retrieval granularity.
Wanderland goes further: the graph is the primary medium of thought.
- L3 = "tool code space" (functions)
- L4 = "fact/data space" (results)
- L5 = "narrative space" (documents/views)
Queries can:
- Target any combination of levels
- Compose arbitrary context windows
- Feed into other fences (closed-loop computation)
- Feed into external LLMs (classical RAG)
- Feed into human-facing documents/dashboards
The Difference
| Traditional | Wanderland |
|---|---|
| Docs with RAG bolted on | Persistent, queryable attention field |
| Retrieval then computation | Retrieval and computation are same operation |
| Ad-hoc chunking | Context windows assembled by graph traversal |
| Results without history | Provenance flows with results across layers |
The Pitch
RAG as native attention over a DAG substrate, with fences as heads, queries as Q, and layered caches as V.
An externalizable mind, not a pile of markdown.
Sources
Provenance
- Source: Perplexity synthesis + conversation, 2025-01-05
- Status: π’ Ready for pitch
- Context: Framing for Amjad follow-up conversations
South
slots:
- context:
- ISA specification implements the RAG-as-native-attention pattern
slug: unified-peek-poke-cache-design-20260105