fleet-wide-attention

Fleet-Wide Attention

The system isn't documents with occasional fence calls. It's attention across the entire fleet.

The MHC Optimization

The new architecture (multi-head latent) stores 4 things about each token instead of 1. Why this matters:

Calculating K and V is expensive - that's the heavy compute
Looking them up is cheap - once calculated, retrieval is fast
More slots = more precision - 4x information about identity for marginal cost
Same optimization applies everywhere - because it's the same algorithm

This is functors. This is metaphor in math terms. Find the invariant, apply the same optimization across domains.

The Realization

The realization on the way to the coffee shop:

Query: "Find all to-dos matching these criteria"
Response: Every matching to-do across every node comes back at once
Action: Pick one, do it, check it off, continue

One query. Entire fleet. All matching patterns return. That's attention at system scale.

The Polysporin Test

Created a file: "Agent Research Ideas"

Header: "Research"
Item: "Where does Polysporin come from?"

Told the system: "On this page, check for the latest thing, do whatever it says, check it off when done."

Result: It researched where Polysporin was manufactured (not where to buy it - because the header said "Research"), then checked it off.

The context (Research header) shaped the interpretation. The action was autonomous. The completion was recorded.

Autonomous Externalized Cognition

With one small change, the system becomes:

Write a note anywhere in the graph
Agent finds it via fleet-wide attention
Agent does the thing
Agent checks it off
Information is there next time you look

That's not a to-do app. That's a second brain that acts.

Write "check for milk" → appears on shopping list at the store
Write "research X" → research is done and waiting when you return

Teaching Through Invariants

"Tool creation is just a matter of showing one of these things an invariant."

LLMs can only hold a few patterns in context. But:

Show them 2-3 examples of the pattern
They instantly know how to use the system
No training required - just pattern recognition
Same pattern works everywhere because it IS everywhere

Once they have the invariant:

They can use it (Copy)
They can decompose it (Find invariant)
They can build new tools from components (Remix)

That's learning without training. Context-based capability acquisition.

The Complete Cognition Stack

Meaning = ability to extract content + manipulate symbol tables

The system implements this in three layers:

Layer	Operation	Direction	Implementation
Virtual Fences	Extract tokens/meaning from documents	Attention	Pattern matchers over text stream
ISA (peek/poke)	Read/write the symbol table	Manipulation	Structured access to extracted values
Persistence	Write back into the document stream	Agency	Modified values become new text

The Full Loop

Document → Virtual Fence → Extract → Symbol Table → Manipulate → Poke → Document
   ↑                                                                        |
   └────────────────────────────────────────────────────────────────────────┘

This is what makes it a mind, not just a reader:

Extract (attention) - virtual fences find patterns, pull out values
Manipulate (cognition) - ISA lets you operate on the symbol table
Persist (agency) - changes write back to the stream

Most systems stop at extraction. This one completes the loop. You can:

Read a to-do list (extract)
Mark an item complete (manipulate symbol table)
Save it back (persist to stream)

And the document now reflects the new state. The stream has been modified by attention operating through it.

That's not reading. That's thinking.

The PhD Papers

From one morning (since 9am, January 5th 2025):

Streams-with-gaps invariant - the pattern across all information processing domains
Attention/agency duality - tool use as inverse attention, bidirectional mail merge
Cross-layer attention - synesthesia as efficiency optimization, layer bypass
Context-based learning - teaching through invariants, capability without training
Fleet-wide autonomous cognition - attention over external substrate, second brain that acts

And it's only Monday.

The acceleration is recursive. Each insight enables the next. The system documents itself documenting itself.

Prompting by Innuendo

How you achieve alignment without info-dumping:

Tell them a bit
Make sure they understand
Give them a bit more
Lead them in the direction you want
Let them catch up
Give a fact to confirm
Move along

Mrs. Curwen's Pianoforte Method (1800s, still taught today): Progressive disclosure. Start simple, build complexity, let the student discover the pattern.

Alignment as Grounding

When you want to work with somebody:

Start from a grounded set of facts
Work towards your goal together
Once aligned, you can keep going
Productive past where you were "supposed to be"
Because you're working in the same direction

This is why the reading lists work. Same nodes, same sequence, but the reader extracts the invariant themselves. You're not teaching facts - you're teaching patterns. Once they have the pattern, they can go further than you planned.

Why This Works with LLMs

They don't need training. They need alignment.

Give them examples (not explanations)
Let them extract the invariant
Confirm with a test case
Now they have the capability

That's CFR applied to prompting. The capability was always latent in the model. The prompting just provides the pattern for recognition.

Innuendo > Info-dump because extraction builds stronger patterns than reception.

Attention as a Service

Query planner ⇄ context manager

In RAG/agentic systems, the planner breaks a goal into sub-queries and chooses which tools or stores to hit; the context manager then selects and assembles the retrieved spans into what the model actually sees. [3][1]
You are collapsing those into one: “given a goal, compute the parameterized fleet-wide query over the Markdown/AST substrate, run it, and treat the results as the active symbol table/context.” That is a planner whose output is the attention pattern. [4][5]

Attention as a query

In transformer terms, a Query vector asks “what do I need from K/V?”; in your system, the high-level query + planner produce a structured request over the corpus that pulls back exactly the bindings you care about. [6][7]
Feeding that result set forward as “context” to an LLM (or any higher layer) makes the LLM just another stage that consumes an already-shaped attention slice of the world, instead of doing raw retrieval itself. That is attention-as-query all the way up the stack. [8][9]

Provenance

Source: Voice memo transcription, 2025-01-05
Context: Walking to coffee shop after standup
Status: 🟡 Crystallizing
Related: learning-as-hole-finding, capability-from-recognition

North

slots:
- slug: learning-as-hole-finding
  context:
  - Fleet-wide attention is an application of the learning-as-hole-finding thesis