message-passing-invariant-formal
Message-Passing Invariant: Formal Statement
From observation to derivation. If this is real, it should be provable.
1. Formal Statement of the Invariant
Definitions
Let $\mathcal{G} = (V, E)$ be a graph where:
- $V$ = set of nodes (agents, neurons, variables, markets)
- $E$ = edges (communication channels, synapses, dependencies)
Let each node $i \in V$ have:
- Local state: $x_i \in \mathcal{X}_i$ (belief, price, activation)
- Local observation: $y_i \in \mathcal{Y}_i$ (evidence, demand, input)
- Incoming messages: ${m_{j \to i}}$ for $j \in \mathcal{N}(i)$
The Problem
Goal: Compute a global consistent state $\mathbf{x}^* = (x_1^*, \ldots, x_n^*)$ such that:
- Each $x_i^*$ is consistent with local observation $y_i$
- Adjacent nodes agree: $x_i^*$ and $x_j^*$ are compatible for all $(i,j) \in E$
Constraints:
- C1 (Locality): Node $i$ can only access $y_i$ and messages from neighbors
- C2 (Finite Memory): $|x_i| \leq M$ for some bound $M$
- C3 (Finite Bandwidth): $|m_{i \to j}| \leq B$ for some bound $B$
- C4 (Causality): Updates at time $t$ depend only on information from $t-1$
The Invariant (Conjecture)
Claim: Under constraints C1-C4, any algorithm achieving global consistency must be equivalent to:
$$x_i^{(t+1)} = f_i\left(y_i, {m_{j \to i}^{(t)}}_{j \in \mathcal{N}(i)}\right)$$
$$m_{i \to j}^{(t+1)} = g_{ij}\left(x_i^{(t+1)}, {m_{k \to i}^{(t)}}_{k \neq j}\right)$$
where convergence is to a fixed point $\mathbf{x}^*$ that minimizes a functional:
$$F(\mathbf{x}) = \sum_i F_i(x_i, y_i) + \sum_{(i,j) \in E} F_{ij}(x_i, x_j)$$
This is the variational free energy (or Bethe free energy in the loopy case).
Equivalently (Information-Theoretic Form)
$$F(\mathbf{x}) = D_{KL}(q(\mathbf{x}) | p(\mathbf{x} | \mathbf{y})) - \log p(\mathbf{y})$$
The algorithm minimizes divergence from the true posterior given observations.
2. Convergence Proof (Sketch)
Why This Algorithm?
Theorem (Informal): Message passing is the UNIQUE algorithm satisfying:
- Locality (C1)
- Optimality (minimizes F)
- Consistency (fixed point = global optimum)
Proof sketch:
(a) Necessity of iteration:
- Under C1, no node has global information
- Global consistency requires information to propagate across graph
- Diameter $d$ graph requires at least $d$ rounds
- ∴ Iteration is mandatory
(b) Necessity of messages:
- Node $i$ must learn about distant nodes
- Under C3, can only receive finite bits per round
- Must aggregate information from neighbors
- ∴ Messages are mandatory
(c) Form of the update:
- Each node minimizes local contribution to $F$
- Taking derivative and setting to zero gives the BP update equations
- ∴ The specific form is determined by the objective
(d) Convergence conditions:
- On trees: exact convergence to global optimum (Pearl 1982)
- On graphs with loops: convergence to Bethe approximation (Yedidia et al. 2005)
- Sufficient condition: contraction mapping (messages get closer each iteration)
- AIMD-like damping ensures contraction when vanilla BP oscillates
The Key Insight
Why free energy?
The functional $F$ has special structure:
- It decomposes over nodes and edges (matches graph structure)
- Its gradient decomposes into local computations
- Its fixed points correspond to consistent beliefs
Any other objective would either:
- Require global coordination (violating C1)
- Not decompose (violating C2/C3)
- Have fixed points that aren't consistent
Free energy is selected by the constraints themselves.
3. Derived Predictions (Not Observed)
Derivation Strategy
From the formal structure, we can DERIVE what happens when each constraint breaks:
3a. Breaking Conservation (Row/Column Sums ≠ 1)
From formalism: The doubly-stochastic constraint on message matrices ensures:
- Total probability mass = 1 (normalization)
- Each node sends/receives fair share (flow balance)
Derived prediction: Without conservation, the functional $F$ is unbounded.
$$\text{If } \sum_j m_{i \to j} \neq 1 \text{ then } F \to \pm \infty$$
Observable consequence: System cannot find fixed point. Either:
- Beliefs explode to infinity (runaway)
- Beliefs collapse to zero (extinction)
- Oscillation without convergence
Test: Remove normalization from Sinkhorn. Predict: no convergence, values explode or collapse.
3b. Breaking Memory (No State Persistence)
From formalism: The update depends on $x_i^{(t)}$. If nodes have no memory:
$$x_i^{(t+1)} = f_i(y_i, {m_{j \to i}^{(t)}}) \text{ (no dependence on } x_i^{(t)})$$
Derived prediction: Without memory, convergence requires messages to carry ALL historical information.
$$|m_{i \to j}| \geq t \cdot |\mathcal{X}_i| \text{ (grows with time)}$$
But C3 bounds message size. Therefore:
Observable consequence: Under C3 + no memory, system CANNOT converge to complex states. Limited to states expressible in $B$ bits.
Test: Run BP with bounded message size and no node state. Predict: fails on problems requiring more than $B$ bits of solution.
3c. Breaking Hierarchy (Flat Graph)
From formalism: Convergence time depends on graph diameter $d$.
$$t_{converge} = \Omega(d)$$
For flat graphs (no hierarchy), $d = O(n)$ for $n$ nodes. For hierarchical graphs (trees), $d = O(\log n)$.
Derived prediction: Flat graphs have convergence time linear in system size. Hierarchical graphs have logarithmic convergence.
Observable consequence: Flat systems are SLOW. As $n \to \infty$, flat systems fail to converge in practical time.
Test: Compare BP convergence on flat vs. hierarchical graphs of same size. Predict: orders of magnitude difference for large $n$.
3d. Breaking Prediction (Reactive Only)
From formalism: The update can be decomposed:
$$x_i^{(t+1)} = x_i^{(t)} + \alpha \cdot \Delta_i^{(t)}$$
where $\Delta_i$ is the "surprise" (prediction error).
Purely reactive means $x_i^{(t)} = 0$ always (no prior, just respond to input).
Derived prediction: Without prediction:
- Learning rate $\alpha$ must be 1 (no momentum)
- No filtering of noise (every fluctuation is signal)
- Convergence requires strict contraction (very slow)
Observable consequence: Reactive systems are NOISY and SLOW. They track input exactly, including noise, with no smoothing.
Test: Run BP with vs without momentum. Predict: without momentum, tracks noise, slow convergence.
4. Failure Mode Predictions (Derived)
The Failure Mode Table (Now Derived)
| Constraint Broken | Mathematical Consequence | Observable Failure |
|---|---|---|
| Conservation (2a) | $F$ unbounded, no fixed point | Explosion, collapse, or oscillation |
| Memory (2b) | Messages must grow without bound | Can't represent complex states |
| Hierarchy (2c) | Convergence time $O(n)$ not $O(\log n)$ | Fails to scale |
| Prediction (2d) | No noise filtering, $\alpha = 1$ | Noisy and slow |
Combined Failures
Memory + Hierarchy broken:
- Can't store history AND can't use structure to compress
- Derived: System degenerates to memoryless Markov chain
- Observable: Forgets everything, can't build on past
Prediction + Conservation broken:
- Tracking noise + unbounded accumulation
- Derived: Random walk without bounds
- Observable: Brownian explosion
5. Empirical Tests to Falsify
Test 1: Conservation → Explosion
Setup: Sinkhorn iterations without normalization Prediction: Values diverge to ±∞ Falsified if: Values stay bounded without normalization
Test 2: Memory Bound → Complexity Bound
Setup: BP with message size $B$, test on problems of varying complexity Prediction: Fails when optimal solution requires $> B$ bits Falsified if: Solves complex problems with tiny messages
Test 3: Flat vs Hierarchical Scaling
Setup: BP on random graphs vs trees, $n = 10, 100, 1000, 10000$ Prediction: Trees converge in $O(\log n)$, flat in $O(n)$ Falsified if: Scaling is the same, or flat is faster
Test 4: Reactive = Noisy + Slow
Setup: BP with momentum=0 vs momentum=0.9 Prediction: momentum=0 is noisier, slower to converge Falsified if: Momentum doesn't help or makes things worse
Test 5: Cross-Domain Transfer
Setup: AIMD damping (from TCP) applied to BP oscillation Prediction: Stabilizes loopy BP that otherwise oscillates Falsified if: AIMD makes oscillation worse
6. The Strong Falsification
The framework predicts: There is NO algorithm achieving global consistency from local information under C1-C4 that is NOT equivalent to message-passing minimizing free energy.
To falsify: Find such an algorithm. Specifically:
- Input: Local observations + neighbor communication only
- Output: Global consistent state
- Method: Provably NOT message passing
- Performance: Equals or exceeds BP
Candidate challengers:
- Neural networks trained end-to-end (do they learn BP internally?)
- Genetic algorithms (do they implement implicit message passing?)
- Quantum algorithms (do they escape the constraints?)
If ANY of these achieves consistency without message passing, the invariant is false.
7. Status and Gaps
What We Have
- Formal statement of the conjecture
- Sketch of why message passing is necessary
- Derived predictions for constraint violations
- Testable empirical predictions
What We Need
- Rigorous proof of uniqueness (not just necessity)
- Precise characterization of "equivalent to message passing"
- Proof that free energy is the unique decomposable objective
- Empirical validation of derived predictions
Known Weak Points
- "Equivalent to" is doing a lot of work - need to define formally
- Bethe free energy isn't exactly free energy - is the analogy tight?
- Biological systems are messier than the math - how much slop is allowed?
Provenance
- Source: Formalization session, 2026-01-06
- Context: Moving from observation to derivation
- Status: 🔴 Conjecture - needs rigorous proof
- Confidence: Medium - structure seems right, details need work
North
slots:
- context:
- Linking formal statement to parent
slug: higher-order-invariant-effects
East
slots:
- context:
- Linking formal statement to experiments
slug: falsifiable-experiments-message-passing