lantern

falsifiable-experiments-message-passing

Falsifiable Experiments: Message-Passing Invariant

Experimental designs to test whether message-passing-to-free-energy-minimum is mandatory for local→global consistency.


The Core Claim

Thesis: Any system that achieves global consistency from local information will converge on iterative message-passing that minimizes a free-energy-like functional.

Corollary: Breaking specific constraints produces predictable failure modes.

This is either a deep truth about embedded inference or a case of seeing hammers everywhere. These experiments distinguish between the two.


Experiment 1: Cellular Automata Constraint Breaking

Experiment 1: Cellular Automata Constraint Breaking

Test the message-passing invariant by breaking specific constraints in Conway's Game of Life variants.


Run the Experiment

The harness lives at [[ca-constraint-experiment-harness]]. Configure and execute:

Current Configuration:

path: config

❌ Fence Execution Error: No fence found in section 'yaml' of node 'ca-constraint-experiment-harness'

To modify: Edit the config section in ca-constraint-experiment-harness, then re-render this page.

Parameter Default Description
trials 20 Number of independent runs per rule
steps 300 Simulation steps per trial
grid_size 50 Grid dimensions (50×50)
noise_levels [0, 0.01, 0.05] Environmental noise for sweep experiments

Live Results

TableConfig:
  array_path: tests
  columns:
    Rule: rule
    Constraint: constraint
    Prediction: prediction
    Variance: variance
    Density: density
    Status: status
  format: markdown

❌ Fence Execution Error: No module named 'numpy' Traceback (most recent call last): File "/app/oculus/providers/python_provider.py", line 468, in execute exec(fence_content, exec_globals, exec_locals) File "", line 1, in ModuleNotFoundError: No module named 'numpy'

Execution Summary:

path: summary

❌ Fence Execution Error: No fence found in section 'yaml' of node 'ca-constraint-experiment-harness'


Historical Runs

Run 1: Initial Exploration (5 trials, 500 steps)

Date: 2026-01-06
Config: trials=5, steps=500, grid_size=50

Rule Variance Spatial Corr Result
GoL (Baseline) 340 0.021 BASELINE
Break Conservation (2a) 1130 - ✅ 3.3x variance
Break Memory (2b) 2256 0.004 ✅ 6.6x variance
Break Hierarchy (2c) 0 - ✅ Collapsed to all-1s
Enhance Prediction (2d+) 260 - ✅ 24% lower
Overprediction (2d-) 315 - ❓ Appeared better

Conclusion: 4/5 confirmed. Overprediction result was suspicious - looked helpful.

Run 2: Statistical Analysis (50 trials, 300 steps)

Date: 2026-01-06
Config: trials=50, steps=300, grid_size=50
Purpose: Proper statistics with standard errors and z-scores

Noise GoL Variance Overpred Variance z-score Significance
0% 488 ± 75 299 ± 62 1.94 not sig (p≈0.05)
0.5% 790 ± 82 1108 ± 106 -2.37 GoL better (p<0.05)
5% 1041 ± 30 1426 ± 63 -5.53 GoL better (p<0.001)

Conclusion: Overprediction is never significantly better. Initial result was noise. Original hypothesis CONFIRMED.

Run 3: Current Live Run (20 trials, 300 steps)

See Live Results table above.

Conclusion: 6/6 constraints confirmed. Framework survives.


Analysis

Conservation (2a): Breaking it causes exactly what we predicted - system can't find fixed point, perpetual noise, ~1.7x higher variance than baseline.

Memory (2b): Devastating effect - ~4x higher variance. System literally can't remember anything, can't maintain structure.

Hierarchy (2c): Converges to trivial all-alive state (density=1.0, variance=0). Not "no coherence" but "degenerate coherence" - the prediction was right but mechanism was collapse rather than chaos.

Prediction Enhancement (2d+): Momentum smoothing helps. ~60% lower variance than baseline. Confirms prediction helps.

Prediction Overshoot (2d-): CONFIRMED - overprediction is never significantly better than baseline. At best a statistical tie (0% noise), often significantly worse. The "anticipate future states" logic creates cascading pessimism - preemptive deaths trigger more deaths.


Code

  • Interactive Playground: [[ca-constraint-lab]] (Rose Pine, runs in browser)
  • Executable Harness: [[ca-constraint-experiment-harness]] (runs in graph)
  • CLI Script: /Users/graemefawcett/working/wanderland/experiments/ca_constraint_breaking.py
  • Statistical Analysis: /Users/graemefawcett/working/wanderland/experiments/ca_noise_sweep.py

Experiment 2: Neural Network Ablation Study

Motivation

Modern neural nets are complex, but we can surgically impair specific constraint-related mechanisms and test if the predicted failure mode emerges.

Setup

Use a standard transformer (e.g., GPT-2 small) on a task requiring:

  • Long-range coherence (memory)
  • Compositional structure (hierarchy)
  • Next-token prediction (prediction)

Task: Story completion with planted facts early in context.

Ablation Conditions

Ablation What We Break Predicted Failure
Reduce context to 32 tokens Memory (2b) Forgets planted facts, incoherent over distance
Remove layer norms Conservation (2a) Training instability, exploding/vanishing
Flatten to 1 layer Hierarchy (2c) Can't compose, treats everything as surface pattern
Remove residual connections Prediction (2d) Slow learning, can't shortcut to expected patterns
Random attention (not learned) Message passing itself Complete failure - no consistency

Measurements

  • Fact recall accuracy at various distances
  • Perplexity on held-out text
  • Compositionality tests (novel combinations of known elements)
  • Training dynamics (loss curves, gradient norms)

The Key Comparison

If we break DIFFERENT constraints but get the SAME failure mode, the mapping is wrong.

If we break the SAME constraint in different ways and get DIFFERENT failures, the constraint categories are too coarse.

Falsification

  • Ablation X produces failure Y instead of predicted failure X
  • Random attention somehow still achieves coherence
  • A flat network (1 layer) matches deep network on compositional tasks

Experiment 3: Artificial Market Tâtonnement

Motivation

Test whether market equilibrium finding is actually message-passing, and whether breaking constraints produces economic pathologies.

Setup

Agent-based model with:

  • N agents with different utility functions
  • M goods to trade
  • Prices adjust via tâtonnement (or alternative mechanisms)

Conditions

Condition Mechanism Predicted Outcome
Classic Tâtonnement Price adjusts proportional to excess demand Converges to equilibrium
No Memory Price based only on current-round demand Oscillates, never settles
No Price Signals Agents can't see prices, random matching No equilibrium, massive inefficiency
Prediction Added Agents anticipate price changes Faster convergence OR bubbles if overfit
Hierarchy Added Market makers aggregate demand Faster convergence, more stable

The Interesting Test: Alternative Mechanisms

What if we DON'T use tâtonnement? What other mechanisms achieve equilibrium?

  • Random matching + selection: Evolutionary pressure toward equilibrium
  • Central planner: No message passing, direct optimization
  • Auction mechanisms: Different message structure

Key question: Do non-tâtonnement mechanisms secretly implement message passing? Or do they achieve equilibrium through genuinely different means?

Measurements

  • Rounds to reach ε-equilibrium
  • Price stability (variance over time)
  • Allocative efficiency (total utility achieved)
  • Gini coefficient (fairness of distribution)

Falsification

  • Non-message-passing mechanism achieves faster/better equilibrium
  • Breaking memory has no effect (agents learn despite no price history)
  • Breaking hierarchy makes things BETTER (decentralization wins)

Experiment 4: Cross-Domain Transfer Test

Motivation

If these are all the same algorithm, techniques should transfer. If transfer fails, the "unification" is superficial.

Proposed Transfers

Source Domain Technique Target Domain Prediction
TCP AIMD (additive increase, multiplicative decrease) NN Learning Rate Stable convergence to optimal LR
Sinkhorn Row/column normalization Social choice (voting) Fairer outcomes, prevents domination
Hippocampal replay Sleep-phase retraining LLM fine-tuning Reduced catastrophic forgetting
BP damping Message damping factor Economic price adjustment Reduced oscillation, faster equilibrium
Legal precedent Stare decisis weighting RL reward shaping More stable policy learning

Detailed Design: AIMD for Learning Rate

Hypothesis: If loss decreased this epoch, increase LR additively. If loss increased, decrease LR multiplicatively (cut in half).

Comparison: Standard learning rate schedules (cosine, step decay, warmup)

Prediction: AIMD should converge reliably across different architectures without tuning, just like TCP converges across different networks.

Falsification: AIMD performs worse than tuned schedules, doesn't transfer across architectures.

Detailed Design: Sinkhorn for Voting

Hypothesis: Apply Sinkhorn iterations to voting matrices (voters × candidates → preferences). Doubly stochastic output = "fair" influence distribution.

Comparison: Standard voting methods (plurality, ranked choice, approval)

Prediction: Sinkhorn voting resists strategic manipulation, produces more representative outcomes.

Falsification: Sinkhorn voting is MORE manipulable, or produces pathological outcomes (everyone gets 1/N influence regardless of preferences).


Experiment 5: Pathology Diagnosis

Motivation

If the framework is useful, it should diagnose real-world failures. Take known pathologies, predict which constraint failed, check if fixing that constraint helps.

Case Studies

Pathology Framework Prediction Test
LLM Hallucination 2d overshoot (prediction too confident) Add uncertainty estimation, calibration → reduces hallucination?
Market Flash Crash 2a deferred (conservation violation caught up) Add circuit breakers (force conservation) → prevents crashes?
Organizational Dysfunction 2c break (hierarchy without function) Restore functional modularity → improves performance?
Catastrophic Forgetting 2b failure (memory policy broken) Add replay/EWC (fix retention) → preserves old knowledge?

The Strong Test

Prediction: Fixing the WRONG constraint won't help. If hallucination is 2d, adding memory (2b fix) won't reduce it. If forgetting is 2b, adding hierarchy (2c fix) won't help.

This makes the framework disprovable: misdiagnosis should lead to failed interventions.

Measurements

  • Intervention success rate when framework-guided
  • Intervention success rate when random
  • Intervention success rate when guided by alternative framework

Falsification

  • Random interventions work as well as framework-guided
  • Fixing "wrong" constraint helps anyway (categories not distinct)
  • Alternative framework outperforms

Meta-Experiment: Adversarial Search

Motivation

Actively try to break the framework. Find counterexamples.

Protocol

  • List all systems claimed to exhibit the pattern
  • For each, identify the weakest link in the analogy
  • Design a test that would prove the analogy is superficial
  • Run the test (even as thought experiment)
  • Document whether the framework survives

Known Weak Points to Attack

  • Is Sinkhorn really "message passing"? It's matrix operations, not graph messages.
  • Is tâtonnement really "free energy"? What's the functional being minimized?
  • Are neurons really doing BP? The biology is much messier than clean equations.
  • Is "free energy" even the same thing across domains? Or are we equivocating?

The Honest Assessment

The framework might be:

  • True and deep - All instances are the same algorithm
  • True but shallow - All instances are similar but details matter
  • Useful but false - The analogy helps thinking but isn't literally true
  • False and misleading - We're seeing patterns that aren't there

These experiments help distinguish these possibilities.


Summary: What Would Change Our Mind

Evidence Conclusion
Transfer works reliably Framework has predictive power
Transfer fails despite structural match Unification is superficial
Ablations produce predicted failures Constraint categories are real
Ablations produce unexpected failures Categories are wrong or too coarse
Non-message-passing achieves consistency Message passing isn't mandatory
All tests confirm We might be right, or we haven't tried hard enough

Provenance

  • Source: Falsifiability discussion, 2026-01-06
  • Context: Extending higher-order-invariant-effects with experimental tests
  • Status: 🟡 Designed, not executed

North

slots:
- slug: higher-order-invariant-effects
  context:
  - Linking experiments to parent framework node

West

slots:
- context:
  - Linking formal statement to experiments
  slug: message-passing-invariant-formal
- context:
  - Experimental framework for testing claims
  slug: fetch-semantics-manifesto
- context:
  - Experimental validation feeds into paper
  slug: computational-horizons-paper-outline

South

slots:
- slug: ca-constraint-lab
  context:
  - CA lab is an implementation of the experiments documented in falsifiable-experiments
- slug: ca-constraint-experiment-harness
  context:
  - Harness backs the falsifiable experiments documentation