falsifiable-experiments-message-passing

Falsifiable Experiments: Message-Passing Invariant

Experimental designs to test whether message-passing-to-free-energy-minimum is mandatory for local→global consistency.

The Core Claim

Thesis: Any system that achieves global consistency from local information will converge on iterative message-passing that minimizes a free-energy-like functional.

Corollary: Breaking specific constraints produces predictable failure modes.

This is either a deep truth about embedded inference or a case of seeing hammers everywhere. These experiments distinguish between the two.

Experiment 1: Cellular Automata Constraint Breaking

Test the message-passing invariant by breaking specific constraints in Conway's Game of Life variants.

Run the Experiment

The harness lives at [[ca-constraint-experiment-harness]]. Configure and execute:

Current Configuration:

path: config

❌ Fence Execution Error: No fence found in section 'yaml' of node 'ca-constraint-experiment-harness'

To modify: Edit the config section in ca-constraint-experiment-harness, then re-render this page.

Parameter	Default	Description
`trials`	20	Number of independent runs per rule
`steps`	300	Simulation steps per trial
`grid_size`	50	Grid dimensions (50×50)
`noise_levels`	[0, 0.01, 0.05]	Environmental noise for sweep experiments

Live Results

TableConfig:
  array_path: tests
  columns:
    Rule: rule
    Constraint: constraint
    Prediction: prediction
    Variance: variance
    Density: density
    Status: status
  format: markdown

❌ Fence Execution Error: No module named 'numpy' Traceback (most recent call last): File "/app/oculus/providers/python_provider.py", line 468, in execute exec(fence_content, exec_globals, exec_locals) File "", line 1, in ModuleNotFoundError: No module named 'numpy'

Execution Summary:

path: summary

❌ Fence Execution Error: No fence found in section 'yaml' of node 'ca-constraint-experiment-harness'

Historical Runs

Run 1: Initial Exploration (5 trials, 500 steps)

Date: 2026-01-06
Config: trials=5, steps=500, grid_size=50

Rule	Variance	Spatial Corr	Result
GoL (Baseline)	340	0.021	BASELINE
Break Conservation (2a)	1130	-	✅ 3.3x variance
Break Memory (2b)	2256	0.004	✅ 6.6x variance
Break Hierarchy (2c)	0	-	✅ Collapsed to all-1s
Enhance Prediction (2d+)	260	-	✅ 24% lower
Overprediction (2d-)	315	-	❓ Appeared better

Conclusion: 4/5 confirmed. Overprediction result was suspicious - looked helpful.

Run 2: Statistical Analysis (50 trials, 300 steps)

Date: 2026-01-06
Config: trials=50, steps=300, grid_size=50
Purpose: Proper statistics with standard errors and z-scores

Noise	GoL Variance	Overpred Variance	z-score	Significance
0%	488 ± 75	299 ± 62	1.94	not sig (p≈0.05)
0.5%	790 ± 82	1108 ± 106	-2.37	GoL better (p<0.05)
5%	1041 ± 30	1426 ± 63	-5.53	GoL better (p<0.001)

Conclusion: Overprediction is never significantly better. Initial result was noise. Original hypothesis CONFIRMED.

Run 3: Current Live Run (20 trials, 300 steps)

See Live Results table above.

Conclusion: 6/6 constraints confirmed. Framework survives.

Analysis

Conservation (2a): Breaking it causes exactly what we predicted - system can't find fixed point, perpetual noise, ~1.7x higher variance than baseline.

Memory (2b): Devastating effect - ~4x higher variance. System literally can't remember anything, can't maintain structure.

Hierarchy (2c): Converges to trivial all-alive state (density=1.0, variance=0). Not "no coherence" but "degenerate coherence" - the prediction was right but mechanism was collapse rather than chaos.

Prediction Enhancement (2d+): Momentum smoothing helps. ~60% lower variance than baseline. Confirms prediction helps.

Prediction Overshoot (2d-): CONFIRMED - overprediction is never significantly better than baseline. At best a statistical tie (0% noise), often significantly worse. The "anticipate future states" logic creates cascading pessimism - preemptive deaths trigger more deaths.

Code

Interactive Playground: [[ca-constraint-lab]] (Rose Pine, runs in browser)
Executable Harness: [[ca-constraint-experiment-harness]] (runs in graph)
CLI Script: /Users/graemefawcett/working/wanderland/experiments/ca_constraint_breaking.py
Statistical Analysis: /Users/graemefawcett/working/wanderland/experiments/ca_noise_sweep.py

Experiment 2: Neural Network Ablation Study

Motivation

Modern neural nets are complex, but we can surgically impair specific constraint-related mechanisms and test if the predicted failure mode emerges.

Setup

Use a standard transformer (e.g., GPT-2 small) on a task requiring:

Long-range coherence (memory)
Compositional structure (hierarchy)
Next-token prediction (prediction)

Task: Story completion with planted facts early in context.

Ablation Conditions

Ablation	What We Break	Predicted Failure
Reduce context to 32 tokens	Memory (2b)	Forgets planted facts, incoherent over distance
Remove layer norms	Conservation (2a)	Training instability, exploding/vanishing
Flatten to 1 layer	Hierarchy (2c)	Can't compose, treats everything as surface pattern
Remove residual connections	Prediction (2d)	Slow learning, can't shortcut to expected patterns
Random attention (not learned)	Message passing itself	Complete failure - no consistency

Measurements

Fact recall accuracy at various distances
Perplexity on held-out text
Compositionality tests (novel combinations of known elements)
Training dynamics (loss curves, gradient norms)

The Key Comparison

If we break DIFFERENT constraints but get the SAME failure mode, the mapping is wrong.

If we break the SAME constraint in different ways and get DIFFERENT failures, the constraint categories are too coarse.

Falsification

Ablation X produces failure Y instead of predicted failure X
Random attention somehow still achieves coherence
A flat network (1 layer) matches deep network on compositional tasks

Experiment 3: Artificial Market Tâtonnement

Motivation

Test whether market equilibrium finding is actually message-passing, and whether breaking constraints produces economic pathologies.

Setup

Agent-based model with:

N agents with different utility functions
M goods to trade
Prices adjust via tâtonnement (or alternative mechanisms)

Conditions

Condition	Mechanism	Predicted Outcome
Classic Tâtonnement	Price adjusts proportional to excess demand	Converges to equilibrium
No Memory	Price based only on current-round demand	Oscillates, never settles
No Price Signals	Agents can't see prices, random matching	No equilibrium, massive inefficiency
Prediction Added	Agents anticipate price changes	Faster convergence OR bubbles if overfit
Hierarchy Added	Market makers aggregate demand	Faster convergence, more stable

The Interesting Test: Alternative Mechanisms

What if we DON'T use tâtonnement? What other mechanisms achieve equilibrium?

Random matching + selection: Evolutionary pressure toward equilibrium
Central planner: No message passing, direct optimization
Auction mechanisms: Different message structure

Key question: Do non-tâtonnement mechanisms secretly implement message passing? Or do they achieve equilibrium through genuinely different means?

Measurements

Rounds to reach ε-equilibrium
Price stability (variance over time)
Allocative efficiency (total utility achieved)
Gini coefficient (fairness of distribution)

Falsification

Non-message-passing mechanism achieves faster/better equilibrium
Breaking memory has no effect (agents learn despite no price history)
Breaking hierarchy makes things BETTER (decentralization wins)

Experiment 4: Cross-Domain Transfer Test

Motivation

If these are all the same algorithm, techniques should transfer. If transfer fails, the "unification" is superficial.

Proposed Transfers

Source Domain	Technique	Target Domain	Prediction
TCP	AIMD (additive increase, multiplicative decrease)	NN Learning Rate	Stable convergence to optimal LR
Sinkhorn	Row/column normalization	Social choice (voting)	Fairer outcomes, prevents domination
Hippocampal replay	Sleep-phase retraining	LLM fine-tuning	Reduced catastrophic forgetting
BP damping	Message damping factor	Economic price adjustment	Reduced oscillation, faster equilibrium
Legal precedent	Stare decisis weighting	RL reward shaping	More stable policy learning

Detailed Design: AIMD for Learning Rate

Hypothesis: If loss decreased this epoch, increase LR additively. If loss increased, decrease LR multiplicatively (cut in half).

Comparison: Standard learning rate schedules (cosine, step decay, warmup)

Prediction: AIMD should converge reliably across different architectures without tuning, just like TCP converges across different networks.

Falsification: AIMD performs worse than tuned schedules, doesn't transfer across architectures.

Detailed Design: Sinkhorn for Voting

Hypothesis: Apply Sinkhorn iterations to voting matrices (voters × candidates → preferences). Doubly stochastic output = "fair" influence distribution.

Comparison: Standard voting methods (plurality, ranked choice, approval)

Prediction: Sinkhorn voting resists strategic manipulation, produces more representative outcomes.

Falsification: Sinkhorn voting is MORE manipulable, or produces pathological outcomes (everyone gets 1/N influence regardless of preferences).

Experiment 5: Pathology Diagnosis

Motivation

If the framework is useful, it should diagnose real-world failures. Take known pathologies, predict which constraint failed, check if fixing that constraint helps.

Case Studies

Pathology	Framework Prediction	Test
LLM Hallucination	2d overshoot (prediction too confident)	Add uncertainty estimation, calibration → reduces hallucination?
Market Flash Crash	2a deferred (conservation violation caught up)	Add circuit breakers (force conservation) → prevents crashes?
Organizational Dysfunction	2c break (hierarchy without function)	Restore functional modularity → improves performance?
Catastrophic Forgetting	2b failure (memory policy broken)	Add replay/EWC (fix retention) → preserves old knowledge?

The Strong Test

Prediction: Fixing the WRONG constraint won't help. If hallucination is 2d, adding memory (2b fix) won't reduce it. If forgetting is 2b, adding hierarchy (2c fix) won't help.

This makes the framework disprovable: misdiagnosis should lead to failed interventions.

Measurements

Intervention success rate when framework-guided
Intervention success rate when random
Intervention success rate when guided by alternative framework

Falsification

Random interventions work as well as framework-guided
Fixing "wrong" constraint helps anyway (categories not distinct)
Alternative framework outperforms

Meta-Experiment: Adversarial Search

Motivation

Actively try to break the framework. Find counterexamples.

Protocol

List all systems claimed to exhibit the pattern
For each, identify the weakest link in the analogy
Design a test that would prove the analogy is superficial
Run the test (even as thought experiment)
Document whether the framework survives

Known Weak Points to Attack

Is Sinkhorn really "message passing"? It's matrix operations, not graph messages.
Is tâtonnement really "free energy"? What's the functional being minimized?
Are neurons really doing BP? The biology is much messier than clean equations.
Is "free energy" even the same thing across domains? Or are we equivocating?

The Honest Assessment

The framework might be:

True and deep - All instances are the same algorithm
True but shallow - All instances are similar but details matter
Useful but false - The analogy helps thinking but isn't literally true
False and misleading - We're seeing patterns that aren't there

These experiments help distinguish these possibilities.

Summary: What Would Change Our Mind

Evidence	Conclusion
Transfer works reliably	Framework has predictive power
Transfer fails despite structural match	Unification is superficial
Ablations produce predicted failures	Constraint categories are real
Ablations produce unexpected failures	Categories are wrong or too coarse
Non-message-passing achieves consistency	Message passing isn't mandatory
All tests confirm	We might be right, or we haven't tried hard enough

Provenance

Source: Falsifiability discussion, 2026-01-06
Context: Extending higher-order-invariant-effects with experimental tests
Status: 🟡 Designed, not executed

North

slots:
- slug: higher-order-invariant-effects
  context:
  - Linking experiments to parent framework node

West

slots:
- context:
  - Linking formal statement to experiments
  slug: message-passing-invariant-formal
- context:
  - Experimental framework for testing claims
  slug: fetch-semantics-manifesto
- context:
  - Experimental validation feeds into paper
  slug: computational-horizons-paper-outline

South

slots:
- slug: ca-constraint-lab
  context:
  - CA lab is an implementation of the experiments documented in falsifiable-experiments
- slug: ca-constraint-experiment-harness
  context:
  - Harness backs the falsifiable experiments documentation

↑ northhigher-order-invariant-effects

↓ southca-constraint-labca-constraint-experiment-harness

← westmessage-passing-invariant-formalfetch-semantics-manifestocomputational-horizons-paper-outline