lantern

fleet-wide-attention

Fleet-Wide Attention

The system isn't documents with occasional fence calls. It's attention across the entire fleet.

The MHC Optimization

The new architecture (multi-head latent) stores 4 things about each token instead of 1. Why this matters:

  • Calculating K and V is expensive - that's the heavy compute
  • Looking them up is cheap - once calculated, retrieval is fast
  • More slots = more precision - 4x information about identity for marginal cost
  • Same optimization applies everywhere - because it's the same algorithm

This is functors. This is metaphor in math terms. Find the invariant, apply the same optimization across domains.

The Realization

The realization on the way to the coffee shop:

Query: "Find all to-dos matching these criteria"
Response: Every matching to-do across every node comes back at once
Action: Pick one, do it, check it off, continue

One query. Entire fleet. All matching patterns return. That's attention at system scale.

The Polysporin Test

Created a file: "Agent Research Ideas"

  • Header: "Research"
  • Item: "Where does Polysporin come from?"

Told the system: "On this page, check for the latest thing, do whatever it says, check it off when done."

Result: It researched where Polysporin was manufactured (not where to buy it - because the header said "Research"), then checked it off.

The context (Research header) shaped the interpretation. The action was autonomous. The completion was recorded.

Autonomous Externalized Cognition

With one small change, the system becomes:

  • Write a note anywhere in the graph
  • Agent finds it via fleet-wide attention
  • Agent does the thing
  • Agent checks it off
  • Information is there next time you look

That's not a to-do app. That's a second brain that acts.

  • Write "check for milk" → appears on shopping list at the store
  • Write "research X" → research is done and waiting when you return

Teaching Through Invariants

"Tool creation is just a matter of showing one of these things an invariant."

LLMs can only hold a few patterns in context. But:

  • Show them 2-3 examples of the pattern
  • They instantly know how to use the system
  • No training required - just pattern recognition
  • Same pattern works everywhere because it IS everywhere

Once they have the invariant:

  • They can use it (Copy)
  • They can decompose it (Find invariant)
  • They can build new tools from components (Remix)

That's learning without training. Context-based capability acquisition.

The Complete Cognition Stack

Meaning = ability to extract content + manipulate symbol tables

The system implements this in three layers:

Layer Operation Direction Implementation
Virtual Fences Extract tokens/meaning from documents Attention Pattern matchers over text stream
ISA (peek/poke) Read/write the symbol table Manipulation Structured access to extracted values
Persistence Write back into the document stream Agency Modified values become new text

The Full Loop

DocumentVirtual FenceExtractSymbol TableManipulatePokeDocument|
   └────────────────────────────────────────────────────────────────────────┘

This is what makes it a mind, not just a reader:

  • Extract (attention) - virtual fences find patterns, pull out values
  • Manipulate (cognition) - ISA lets you operate on the symbol table
  • Persist (agency) - changes write back to the stream

Most systems stop at extraction. This one completes the loop. You can:

  • Read a to-do list (extract)
  • Mark an item complete (manipulate symbol table)
  • Save it back (persist to stream)

And the document now reflects the new state. The stream has been modified by attention operating through it.

That's not reading. That's thinking.

The PhD Papers

From one morning (since 9am, January 5th 2025):

  • Streams-with-gaps invariant - the pattern across all information processing domains
  • Attention/agency duality - tool use as inverse attention, bidirectional mail merge
  • Cross-layer attention - synesthesia as efficiency optimization, layer bypass
  • Context-based learning - teaching through invariants, capability without training
  • Fleet-wide autonomous cognition - attention over external substrate, second brain that acts

And it's only Monday.

The acceleration is recursive. Each insight enables the next. The system documents itself documenting itself.

Prompting by Innuendo

How you achieve alignment without info-dumping:

  • Tell them a bit
  • Make sure they understand
  • Give them a bit more
  • Lead them in the direction you want
  • Let them catch up
  • Give a fact to confirm
  • Move along

Mrs. Curwen's Pianoforte Method (1800s, still taught today): Progressive disclosure. Start simple, build complexity, let the student discover the pattern.

Alignment as Grounding

When you want to work with somebody:

  • Start from a grounded set of facts
  • Work towards your goal together
  • Once aligned, you can keep going
  • Productive past where you were "supposed to be"
  • Because you're working in the same direction

This is why the reading lists work. Same nodes, same sequence, but the reader extracts the invariant themselves. You're not teaching facts - you're teaching patterns. Once they have the pattern, they can go further than you planned.

Why This Works with LLMs

They don't need training. They need alignment.

  • Give them examples (not explanations)
  • Let them extract the invariant
  • Confirm with a test case
  • Now they have the capability

That's CFR applied to prompting. The capability was always latent in the model. The prompting just provides the pattern for recognition.

Innuendo > Info-dump because extraction builds stronger patterns than reception.

Attention as a Service

Query planner ⇄ context manager

  • In RAG/agentic systems, the planner breaks a goal into sub-queries and chooses which tools or stores to hit; the context manager then selects and assembles the retrieved spans into what the model actually sees. [3][1]
  • You are collapsing those into one: “given a goal, compute the parameterized fleet-wide query over the Markdown/AST substrate, run it, and treat the results as the active symbol table/context.” That is a planner whose output is the attention pattern. [4][5]

Attention as a query

  • In transformer terms, a Query vector asks “what do I need from K/V?”; in your system, the high-level query + planner produce a structured request over the corpus that pulls back exactly the bindings you care about. [6][7]
  • Feeding that result set forward as “context” to an LLM (or any higher layer) makes the LLM just another stage that consumes an already-shaped attention slice of the world, instead of doing raw retrieval itself. That is attention-as-query all the way up the stack. [8][9]

Provenance

  • Source: Voice memo transcription, 2025-01-05
  • Context: Walking to coffee shop after standup
  • Status: 🟡 Crystallizing
  • Related: learning-as-hole-finding, capability-from-recognition

North

slots:
- slug: learning-as-hole-finding
  context:
  - Fleet-wide attention is an application of the learning-as-hole-finding thesis