Consider measurement scenarios where different observers see fundamentally incompatible outcomes. Imagine a classroom with three students—Alice, Bob, and Carol—and three seats arranged in a straight line. The teacher gives each student a pairwise constraint:
- Alice must sit next to Bob
- Bob must sit next to Carol
- Carol must sit next to Alice
Each pairwise context is internally consistent: if you consider only a given pair (Alice next to Bob, Bob next to Carol, or Carol next to Alice), that local rule can be satisfied. However, no global seating assignment satisfies all three constraints at once. At most two constraints can hold; the third is inevitably violated.
This illustrates irreducible contradiction: every local context is valid on its own, but no global arrangement reconciles all contexts. Each student can see their pairwise rule satisfied in some assignment, but any global assignment will necessarily leave one constraint broken. The contradiction measure formalizes exactly this classical incompatibility:
from contrakit import Space, Behavior
import math
# Three students paradox: Alice≠Bob, Bob≠Carol, Alice=Carol
space = Space.create(Alice=['near','far'], Bob=['near','far'], Carol=['near','far'])
paradox = Behavior.from_contexts(space, {
('Alice','Bob'): {('far','near'): 1.0}, # Alice not next to Bob
('Bob','Carol'): {('far','near'): 1.0}, # Bob not next to Carol
('Alice','Carol'): {('near','near'): 1.0} # Alice next to Carol
})
print(f"K = {paradox.K:.4f} bits, α* = {paradox.alpha_star:.4f}")
# K = 0.5000 bits, α* = 0.7071Here,
For this simple classroom paradox, the best possible global assignment still violates one constraint, which gives a minimal contradiction of
This shows that even in the simplest odd-cycle scenario, some irreducible "penalty" must be paid.
# Frame-independent behavior (no contradiction)
space = Space.create(A=['0','1'], B=['0','1'])
fi_behavior = Behavior.from_contexts(space, {
('A',): {('0',): 0.6, ('1',): 0.4},
('B',): {('0',): 0.6, ('1',): 0.4},
('A','B'): {('0','0'): 0.36, ('0','1'): 0.24, ('1','0'): 0.24, ('1','1'): 0.16}
})
print(f"FI behavior: K = {fi_behavior.K:.6f} (≈ 0)")
# FI behavior: K = 0.000000 (≈ 0)No contradiction means all perspectives unify into one underlying reality.
# Contradiction additivity for independent systems
paradox1 = Behavior.from_contexts(Space.create(X=['0','1'], Y=['0','1'], Z=['0','1']), {
('X','Y'): {('0','1'): 1.0, ('1','0'): 0.0},
('Y','Z'): {('0','1'): 1.0, ('1','0'): 0.0},
('X','Z'): {('0','0'): 1.0, ('1','1'): 0.0}
})
paradox2 = Behavior.from_contexts(Space.create(A=['0','1'], B=['0','1'], C=['0','1']), {
('A','B'): {('0','1'): 1.0, ('1','0'): 0.0},
('B','C'): {('0','1'): 1.0, ('1','0'): 0.0},
('A','C'): {('0','0'): 1.0, ('1','1'): 0.0}
})
product = (paradox1 @ paradox2).K
print(f"K₁ = {paradox1.K:.4f}, K₂ = {paradox2.K:.4f}")
print(f"K₁ + K₂ = {paradox1.K + paradox2.K:.4f}, Product = {product:.4f}")
# K₁ = 0.5000, K₂ = 0.5000
# K₁ + K₂ = 1.0000, Product = 1.0000Contradiction adds up for independent systems. Every operational task costs
Consider the theory's core results. Six axioms uniquely determine contradiction as the logarithmic transform of minimax Bhattacharyya agreement. That measure captures irreducible perspectival incompatibility.
Foundational theorems establish the mathematical core:
- Theorem 1: Weakest link aggregators equal the minimum (see Representation Theory)
- Theorem 2: Contradiction admits adversarial minimax form (see Representation Theory)
- Theorem 3: Bhattacharyya coefficient uniquely satisfies natural properties (see Representation Theory)
-
Theorem 4:
$K(P) = -\log_2 \alpha^\star(P)$ (see Representation Theory) -
Theorem 5: Independent contradictions add:
$K(P \otimes R) = K(P) + K(R)$ (see Representation Theory)
Operational theorems quantify practical costs:
-
Theorem 6: Typical sets grow by
$K(P)$ bits per observation (see Operational Consequences) -
Theorem 7-8: Compression rates increase by
$K(P)$ (see Operational Consequences) -
Theorem 9: Testing frame-independence requires
$K(P)$ evidence (see Operational Consequences) -
Theorem 10: Witness rate
$K(P)$ enables simulation (see Operational Consequences) -
Theorem 11-12: Common communication costs
$K(P)$ extra bits (see Operational Consequences) -
Theorem 13: Channel capacity drops by
$K(P)$ (see Operational Consequences) -
Theorem 14: Rate-distortion increases by
$K(P)$ (see Operational Consequences) - Theorem 15: Geometry explains additive structure (see Geometric Structure)
Interpretive results connect to concrete penalties:
-
Theorem 7.4: Witness-error tradeoff:
$E + r \geq K(P)$ (see Testing, Prediction) - Theorem 7.5: Universal adversarial structure (see Testing, Prediction)
- Theorem 7.9: Equalizer principle with sparse optimizers (see Geometric Properties)
Key propositions bound operational quantities:
-
Proposition 7.1: Testing bounds:
$\inf_\lambda E_{\text{opt}}(\lambda) \geq K(P)$ (see Testing, Prediction) -
Proposition 7.2: Simulation variance
$\geq 2^{2K(P)} - 1$ (see Testing, Prediction) -
Proposition 7.3: Predictive regret
$\geq 2K(P)$ bits/round (see Testing, Prediction) - Proposition 7.6: Hellinger sphere structure (see Geometric Properties)
- Proposition 7.7: Smoothing bounds on mixing (see Geometric Properties)
- Proposition 7.8: Convex program for computation (see Computational Methods)
Just like the three-student paradox, each context sees a local rule that makes sense, but we now formalize all such scenarios mathematically.
We model systems where multiple observers examine the same underlying reality through different lenses. Each observer sees a context: a specific subset of measurements. A behavior
The frame-independent baseline represents the easy case.
# Frame-independent behaviors can be reconciled
space = Space.create(X=['A','B'], Y=['A','B'])
# Create FI behavior from deterministic assignments
fi_behavior = Behavior.frame_independent(space, [['X'], ['Y'], ['X','Y']])
print(f"FI behavior: K = {fi_behavior.K:.6f}")These
We quantify agreement between distributions using the Bhattacharyya coefficient. That coefficient measures distributional overlap (The Bhattacharyya coefficient
from contrakit.agreement import BhattacharyyaCoefficient
import numpy as np
bc_measure = BhattacharyyaCoefficient()
# Perfect agreement (identical distributions)
p = np.array([0.6, 0.4])
q = np.array([0.6, 0.4])
bc_perfect = bc_measure(p, q)
print(f"Perfect agreement: BC = {bc_perfect:.3f}")
# Partial agreement
p = np.array([0.9, 0.1])
q = np.array([0.1, 0.9])
bc_opposite = bc_measure(p, q)
print(f"Opposite distributions: BC = {bc_opposite:.3f}")
# Perfect agreement: BC = 1.000
# Opposite distributions: BC = 0.600We use this to assess how well different contexts align with each other.
Contradiction emerges when no unified explanation adequately fits all observational contexts. The core definition captures this adversarial relationship:
# Adversarial agreement: max over FI behaviors, min over contexts
space = Space.create(A=['0','1'], B=['0','1'], C=['0','1'])
contradictory = Behavior.from_contexts(space, {
('A','B'): {('0','1'): 1.0, ('1','0'): 0.0},
('B','C'): {('0','1'): 1.0, ('1','0'): 0.0},
('A','C'): {('0','0'): 1.0, ('1','1'): 0.0}
})
print(f"α* = {contradictory.alpha_star:.4f} (best FI agreement)")
print(f"K = {contradictory.K:.4f} bits")
# α* = 0.7071 (best FI agreement)
# K = 0.5000 bitsThis calculation admits an equivalent minimax form:
# Minimax form: min over context weights λ, max over FI behaviors Q
witnesses = contradictory.worst_case_weights
print("Adversarial context weights (λ):")
for ctx, weight in sorted(witnesses.items(), key=lambda x: x[1], reverse=True)[:2]:
print(f" {ctx}: λ = {weight:.3f}")
# Adversarial context weights (λ):
# ('B', 'C'): λ = 0.500
# ('A', 'B'): λ = 0.250That formulation reveals the adversarial structure.
Fundamental bounds constrain the possible values:
# Bounds verification
max_outcomes = max(len(space.alphabets[name]) for name in space.names)
upper_bound = 0.5 * math.log2(max_outcomes)
print(f"Upper bound: K ≤ {upper_bound:.3f} bits")
print(f"Contradictory: K = {contradictory.K:.4f} (within bounds)")
print(f"FI behavior: α* = {fi_behavior.alpha_star:.6f} (≈ 1.0)")
# Upper bound: K ≤ 0.500 bits
# Contradictory: K = 0.5000 (within bounds)
# FI behavior: α* = 1.000000 (≈ 1.0)Zero contradiction occurs only when behaviors lie in
Six fundamental properties uniquely determine the contradiction measure:
A0: Label Invariance - Contradiction measures structural incompatibility, not notational artifacts. Relabeling outcomes or contexts preserves the contradiction level—the pattern of disagreement matters, not what you call the labels.
# Label invariance: relabeling doesn't change contradiction
original = Behavior.from_contexts(Space.create(A=['red','blue'], B=['red','blue']), {
('A','B'): {('red','blue'): 1.0} # A and B disagree
})
# Relabel 'red'→'circle', 'blue'→'square'
relabeled = original.permute_outcomes({
'A': {'red': 'circle', 'blue': 'square'},
'B': {'red': 'circle', 'blue': 'square'}
})
print(f"Original: K = {original.K:.4f}")
print(f"Relabeled: K = {relabeled.K:.4f} (identical)")
# Original: K = 0.0000
# Relabeled: K = 0.0000 (identical)A1: Reduction - Zero contradiction precisely when behaviors are frame-independent. No contradiction means all perspectives unify into one underlying reality—this gives the natural zero point.
A2: Continuity - Small probability changes yield small contradiction changes. Tiny tweaks to distributions shouldn't cause huge jumps in measured incompatibility.
A3: Free Operations - Monotone under legitimate transformations. Adding noise, averaging perspectives, or combining systems cannot create contradiction where none existed.
A4: Grouping - Depends only on refined statistics. How you group observations doesn't change fundamental incompatibility levels—contradiction sees through statistical aggregations.
A5: Independent Composition - Additive for disjoint systems:
Free operations include:
- Stochastic post-processing within contexts
- Convex mixtures:
$K((1-t)P + tQ) \leq \max(K(P), K(Q))$ - Public lotteries over contexts
- Tensoring with FI ancillas:
$K(P \otimes R) \leq K(P)$ for$R \in \mathrm{FI}$
The weakest link principle governs reasonable aggregators. Any unanimity-respecting, monotone aggregator with weakest-link properties equals the minimum:
Among fair combination methods, taking the worst-case opinion is uniquely justified. That principle explains why contradiction focuses on the least agreeable context.
Contradiction manifests as a minimax game. Any contradiction measure satisfying the core axioms admits this representation:
# Minimax representation: contradiction as adversarial game
# Left side: max over FI behaviors, min over contexts
# Right side: min over context weights λ, max over FI behaviors
paradox = Behavior.from_contexts(Space.create(A=['0','1'], B=['0','1'], C=['0','1']), {
('A','B'): {('0','1'): 1.0, ('1','0'): 0.0},
('B','C'): {('0','1'): 1.0, ('1','0'): 0.0},
('A','C'): {('0','0'): 1.0, ('1','1'): 0.0}
})
# The minimax form is computed internally
alpha_minimax = paradox.alpha_star # This uses the minimax computation
k_bits = paradox.K
print(f"Minimax α* = {alpha_minimax:.4f}")
print(f"Equivalent to K = -log2(α*) = {k_bits:.4f}")
# Minimax α* = 0.7071
# Equivalent to K = -log2(α*) = 0.5000for some strictly decreasing continuous
Under refinement separability, product multiplicativity, DPI, joint concavity, and regularity, the agreement kernel is unique:
# BC is the unique agreement kernel satisfying the axioms
from contrakit.agreement import BhattacharyyaCoefficient
import numpy as np
bc_measure = BhattacharyyaCoefficient()
# Verify BC properties: jointly concave, multiplicative
p1 = np.array([0.6, 0.4])
q1 = np.array([0.7, 0.3])
p2 = np.array([0.5, 0.5])
q2 = np.array([0.4, 0.6])
bc_product = bc_measure(p1, q1) * bc_measure(p2, q2)
joint_bc = bc_measure(p1 * p2, q1 * q2) # Product distributions
print(f"BC multiplicativity: {bc_product:.6f} ≈ {joint_bc:.6f}")
# BC multiplicativity: 0.989448 ≈ 0.479564The Bhattacharyya coefficient is uniquely determined by natural mathematical properties. No other agreement measure satisfies these fundamental requirements.
The fundamental formula emerges from the complete axiom set:
Contradiction must be logarithmic in agreement levels. That logarithmic form makes contradiction additive—like information—and gives it the correct units.
For independent systems on disjoint observables with
Contradictions from separate systems multiply for agreement but add for bits. That additivity explains why contradiction behaves like information across independent components.
Contradiction imposes fundamental limits on information processing. The asymptotic equipartition property includes a contradiction tax:
With many contradictory observations, typical pattern counts grow exponentially at rate
Compression rates increase accordingly. With known contexts, you achieve:
You need
When contexts are latent, compression costs more:
Unknown contexts require the full entropy
Witnesses enable different compression regimes. Including witness information at rate
Witnesses essentially explain the contradiction, allowing compression as if contexts were unified.
Hypothesis testing reveals another limitation. When testing frame-independence versus contradiction, the optimal type-II error exponent satisfies
Witnessing enables approximation. Rate
Communication faces similar constraints:
- Common message problem: rate
$\geq H(X|C) + K(P)$ - Common representation:
$\geq H(X|C) + K(P)$ (known contexts) or$H(X) + K(P)$ (latent contexts)
Channel capacity drops when all receivers must decode identically:
Different interpretations of received signals reduce communication efficiency by
Rate-distortion with common reconstruction costs extra:
You cannot losslessly compress contradictory data to a single representation without paying the contradiction tax.
Contradiction induces a specific geometric structure. The Hellinger metric measures distances:
That metric is subadditive:
Testing real versus frame-independent data has fundamental limits:
The best discrimination performance is bounded by
The witness-error tradeoff quantifies the relationship. For witness rate
Importance sampling reveals prediction penalties:
Single-predictor regret bounds show:
The universal adversarial structure unifies these results. Optimal
The Hellinger sphere structure connects contradiction to geometry:
Level sets
The total variation gap shows separation:
Smoothing bounds enable interpolation:
To reduce contradiction to
High stability means reducing significant contradiction requires nearly complete FI mixture.
Convex programs enable computation:
That optimization finds the best unified explanation minimizing worst-case disagreement. Modern solvers handle this efficiently.
The minimax formulation is equivalent:
Sion's theorem guarantees this equals the original max-min.
Statistical estimation uses plug-in methods with bootstrap confidence intervals. Regularized estimation adds pseudocounts for small datasets:
Small pseudocounts like 0.01 prevent zero probabilities and ensure statistical consistency.
The uniform law provides a lower bound:
Minimax duality reveals the tension structure:
with optimal
Odd cycles create minimal contradiction:
Pairwise anti-correlations imply
The classroom seating paradox demonstrates minimal contradiction. Three students with mutually incompatible seating constraints yield:
That system represents the simplest contradictory device—no matter how you assign seats globally, you pay 0.29 bits per observation overhead.
Gaussian measurements extend this to continuous variables. For
With
The product law governs composition:
Hellinger geometry connects everything:
Information bounds quantify the gaps:
Smoothing properties show the interpolation behavior:
Stability is high—reducing significant contradiction to near-zero requires nearly complete
