Prior σ scalemultiplies (hi−lo)/2 per free parameter
Max iterationsTol. ‖Δμ‖ (cp)
One iteration: E-step O(n · D²) + M-step Cholesky 26×26. Typically 20–50 iterations to convergence.
Convergence Running
Iteration
—
of —
‖Δμ‖⊂2 (cp)
—
tol —
Elapsed
—
—
Log-likelihood
—
per position
Posterior summary —
Posterior credible intervals
— each parameter normalised to its own [lo, hi]
Prior ±2σ⊂0;
Posterior ±2σ
Posterior mean
Default value
Posterior correlations
— top pairs by |ρ| from full Σ
Correlations arise from data geometry, not the prior (which is diagonal).
Strong positive ρ: pieces whose values are jointly uncertain. Strong negative ρ: competing explanations for the same positions.
Code output — drop-in replacement for defaultWeights()
Model.
p(y⊂i; = 1 | w) = σ(x⊂i;′ ⊃T;w) where x⊂i;′ = posFeatures(board⊂i;)/K and y⊂i; ∈ {0, ½, 1}.
Prior: w ~ N(μ⊂0;, Σ⊂0;) diagonal, μ⊂0; = current defaults, σ⊂0;⊂j; = scale × (hi−lo)/2.
Frozen parameters use σ⊂0; = 1 cp.
Pólya-Gamma augmented MFVB: all coordinate updates are exact and closed-form.
Advantages.
Returns a full posterior distribution — not just a point estimate.
Every parameter gets a 95% credible interval and the full 26×26 posterior covariance
is available for downstream uncertainty quantification.
Runs on all parameters simultaneously; no curse of dimensionality.