lantern

rag-as-native-attention

RAG as Native Attention

Wanderland is RAG where the retriever is the substrate itself, not a bolt-on step.

Core Claim

Conventional RAG Wanderland
Retrieval is an external stage Retrieval is the attention mechanism
Fetches documents, feeds to LLM Fences, pages, graph queries ARE Q over persistent K/V
Attention only runs inside the model Attention runs over the entire corpus

Attention Mapping

K (Keys)

  • Fence identities (slug:fence)
  • Tags, links, metadata
  • Schema descriptors
  • These define where patterns live and how they can be matched

Q (Queries)

Query Type Attention Equivalent
peek(slug:fence) Single-head local attention
peek(slug) (all fences on page) Multi-head attention within local region
query(pattern/tags/graph-walk) Global attention over entire DAG

V (Values)

Level Content Space
L3 Code/fence definitions Tool body
L4 Computed data results Materialized outputs
L5 Rendered documents Middleware views

The Operation

Every time you:

  • Address a fence, page, or subgraph β†’ you're emitting Q
  • Match against graph structure, indices, tags β†’ that's K
  • Retrieve cached results at various levels β†’ that's V

Already stored. Provenance-tagged. Re-usable.

Layered RAG as Layered Attention

GraphRAG papers argue that RAG over knowledge graphs gives better structure and retrieval granularity.

Wanderland goes further: the graph is the primary medium of thought.

  • L3 = "tool code space" (functions)
  • L4 = "fact/data space" (results)
  • L5 = "narrative space" (documents/views)

Queries can:

  • Target any combination of levels
  • Compose arbitrary context windows
  • Feed into other fences (closed-loop computation)
  • Feed into external LLMs (classical RAG)
  • Feed into human-facing documents/dashboards

The Difference

Traditional Wanderland
Docs with RAG bolted on Persistent, queryable attention field
Retrieval then computation Retrieval and computation are same operation
Ad-hoc chunking Context windows assembled by graph traversal
Results without history Provenance flows with results across layers

The Pitch

RAG as native attention over a DAG substrate, with fences as heads, queries as Q, and layered caches as V.

An externalizable mind, not a pile of markdown.

Sources

Provenance

  • Source: Perplexity synthesis + conversation, 2025-01-05
  • Status: 🟒 Ready for pitch
  • Context: Framing for Amjad follow-up conversations

South

slots:
- context:
  - ISA specification implements the RAG-as-native-attention pattern
  slug: unified-peek-poke-cache-design-20260105