01 · Ising Bayesian · Camilo Rozo

// Overview

About this project

This project combines three things rarely seen together in a junior ML portfolio: real statistical physics, applied Bayesian statistics and end-to-end scientific Python code. The 2D Ising model is ideal: it has a nontrivial phase transition and, at the same time, an exact analytical solution against which any numerical code can be validated.

A Numba-accelerated Metropolis simulator generates the M(T) curve of the system, and a probabilistic model in PyMC with physically motivated priors infers the critical parameters from that noisy curve. The comparison against Onsager is the ultimate test: if inference recovers `Tc ≈ 2.2692` and `β = 1/8` with reasonable uncertainty, the full pipeline is sound.

Everything runs in a single command: python main.py executes simulation, inference, and generates all final figures.

Core formulas

System energy:

E = −J ∑⟨i,j⟩ sᵢ · sⱼ

Metropolis acceptance rule:

P(flip) = min(1, e^−ΔE/T)

Power law near Tc:

M(T) ≈ (1 − T/Tc)^β

Bayes theorem applied:

P(Tc, β | data) ∝ P(data | Tc, β) · P(Tc, β)

Onsager's exact solution:

Tc = 2 / ln(1 + √2) ≈ 2.2692

β = 1/8 = 0.125

// Components

How it's built

Six pieces connected end-to-end. Each lives in its own module with a dedicated CLI.

1 · Metropolis simulator

Goal: sample equilibrium spin configurations at a given temperature T.

How: each step picks a random site, proposes flipping its spin (s → −s) and accepts with probability min(1, exp(−ΔE/T)). A sweep = N² proposed flips. The hot kernel is JIT-compiled via @numba.njit (~50× faster than plain NumPy, with pure-Python fallback).

2 · Temperature sweep

Input: 30 temperatures in T ∈ [1.5, 3.5], covering both ordered and disordered regimes.

Process: at each T, 2000 thermalization sweeps (discarded) + 3000 measurement sweeps (averaged). Output: a CSV T, M_mean, M_std — the "observed dataset" fed to the Bayesian model.

3 · Probabilistic model (PyMC)

Priors: Tc ~ Normal(2.3, 0.3), β ~ Normal(0.12, 0.05), σ ~ HalfNormal(0.1). Deliberately wide — the data should dominate.

Likelihood: M_obs ~ Normal((1 − T/Tc)^β, σ) when T < Tc; exactly 0 above. The piecewise form captures the broken-symmetry ordered phase.

4 · NUTS sampling

Config: 4 chains × (1000 tune + 2000 draws) = 12 000 posterior samples, target_accept=0.9.

Output: the full joint posterior P(Tc, β, σ | data) — distributions, not point estimates. Convergence checks via ArviZ: R̂ < 1.01, ESS > 400 across all parameters.

5 · Validation against Onsager

Overlay the exact values (Tc = 2/ln(1+√2) ≈ 2.2692, β = 1/8) on the posteriors. Success criterion: the 95% HDI of each posterior brackets its exact value. If not → the prior is too tight, the simulator has a bug, or the sampler hasn't converged.

6 · Final figures

Three PNGs auto-saved to results/ by src/plots.py:

magnetization.png — M(T) + Onsager's Tc
posteriors.png — marginal posteriors with exact values
trace.png — per-chain trace plots

// Skills demonstrated

What skills it certifies

Each skill appears in a concrete piece of code, not as a generic claim.

Scientific Python

Vectorized Metropolis simulator organized in clean modules, with argparse CLI per script.

Numba acceleration

@njit(cache=True) kernel with transparent fallback if Numba isn't installed.

Monte Carlo

Metropolis-Hastings with periodic boundaries and separated thermalization + measurement.

Bayesian inference

PyMC model with Normal/HalfNormal priors and Gaussian likelihood over the power law.

MCMC (NUTS)

Sampling of 4 chains × 2000 draws with R̂, ESS, and trace plot diagnostics via ArviZ.

Technical visualization

M(T) curve with errors, posteriors with exact values overlaid, convergence trace plots.

Scientific rigor

Quantitative validation against Onsager (1944): Tc = 2.2692 exact, β = 1/8 exact.

Reproducibility

python main.py runs the full pipeline end-to-end without manual intervention.

Python 3.11 NumPy Numba PyMC 5 ArviZ Matplotlib SciPy NetCDF

// Structure

Project organization

01-ising-bayesian/
├── README.md
├── requirements.txt
├── main.py                 # end-to-end pipeline
├── src/
│   ├── __init__.py
│   ├── metropolis.py       # Numba-accelerated Metropolis simulator
│   ├── bayesian.py         # NumPy MH + PyMC NUTS backends
│   └── plots.py            # publication-style static figures
├── scripts/
│   ├── summarize.py        # posterior summary → JSON
│   └── export_json.py      # samples + fit → JSON for portfolio charts
├── notebooks/
│   └── explore.ipynb       # interactive walk-through of the pipeline
├── figures/              # tracked PNGs embedded in the README
├── data/                 # (gitignored) generated CSV + NetCDF
└── results/              # (gitignored) PNG staging

// Roadmap

Project status

Probabilistic model design with physically motivated priors
Metropolis simulator with Numba acceleration and periodic boundaries
Argparse CLI per module and end-to-end pipeline in main.py
Dual sampler: pure-NumPy MH (default) + PyMC NUTS backend
Visualization module with all three publication-style figures
Full simulation (32×32 lattice · 25 temperatures · 5000 sweeps per T)
MCMC run — diagnostics verified (R̂ < 1.02, ESS > 200)
Polished static figures embedded in the README (GitHub-friendly)
Companion notebook notebooks/explore.ipynb — interactive walk-through
Optional extension available for 64×64 lattice (see README §8)

// Results

Outputs and final metrics

Interactive figures rendered from the actual MCMC posterior samples. Hover for values, zoom with the toolbar, toggle traces in the legend.

Tc = 2.396 ± 0.041

Onsager exact: 2.2692 · shift is a finite-size effect on 32×32

β = 0.100 ± 0.025

exact 1/8 = 0.125 · 95% HDI brackets the exact value

R̂ ≤ 1.02

Gelman-Rubin across all parameters — converged

ESS_bulk ≥ 221

effective independent samples — healthy

Loading magnetization curve…

Fig 1 · Simulated ⟨|M|⟩ vs. temperature on a 32×32 lattice, with 1σ error bars per temperature. Amber dashed line: Onsager's exact Tc. The curve shows the characteristic second-order phase transition — sharp drop around T ≈ 2.3 with largest fluctuations at the transition.

Loading posteriors…

Fig 2 · Marginal posteriors of Tc (left) and β (right) built from 20 000 MCMC samples (4 chains × 5000 draws). Shaded band: 95% HDI. Pink solid line: posterior mean. Amber dashed line: Onsager's exact value. β's HDI brackets its exact value; Tc's does not — the gap is the finite-size effect.

Loading trace plots…

Fig 3 · Per-chain traces across 5000 post-burn iterations (4 chains overlaid). Chains exploring the same band = good mixing. Use the legend to isolate a chain. No visible drift or stuck regions — sampler is healthy.

Fig 1 · static fallback (enable JavaScript for interactive charts)

// Analysis

Results analysis

Why there's no accuracy or confusion matrix here, which metrics do apply, and what the numbers actually mean.

What type of problem is this?

The natural reflex is to ask for accuracy, precision/recall, or a confusion matrix. None of those apply here. This project isn't a classifier — it's a parameter inference problem: given a noisy M(T) curve, what is the joint probability distribution of the physical parameters (Tc, β, σ)?

Problem type	Output	Typical metrics
Classification	discrete class label	accuracy, precision, recall, F1, ROC-AUC, confusion matrix
Regression	continuous number	MSE, MAE, R², RMSE
Parameter inference (this project)	posterior distribution over parameters	HDI, R̂, ESS, posterior predictive checks, ground-truth validation

Evaluation methodology

Four layers stacked together — each answers a different question.

① Point estimate

Posterior mean as the "most likely" value. For Tc the mean is 2.396, for β it's 0.100. Never reported alone — always with its uncertainty.

② Uncertainty quantification

Posterior standard deviation and 95% HDI (Highest Density Interval) — the densest region containing 95% of the posterior mass. This is the Bayesian analogue of the confidence interval.

③ MCMC convergence

R̂ (Gelman-Rubin) compares between-chain vs. within-chain variance (target: ≤ 1.01). ESS is the effective number of independent samples after accounting for autocorrelation. Acceptance rate of MH proposals (target: 20–40% for ℝ³).

④ Ground-truth validation

The Ising 2D model has an exact analytical solution (Onsager 1944). The 95% HDI of the posterior either brackets the exact value or it doesn't — a clean pass/fail check. Stronger than any accuracy metric.

Metrics achieved vs. thresholds

Metric	What it measures	Threshold	Result
R̂ max	chain convergence (Gelman-Rubin)	≤ 1.01 ideal · ≤ 1.05 tolerable	1.02 ✓
ESS_bulk min	effective independent samples	≥ 200	221 ✓
ESS_tail min	sample size in distribution tails	≥ 200	462 ✓
Acceptance rate	MH proposals accepted	20–40% optimal (for ℝ³)	31.7 / 33.3 / 34.7 / 35.1% ✓
β · 95% HDI contains exact?	ground-truth validation	must contain 0.125	[0.052, 0.148] ✓
Tc · 95% HDI contains exact?	ground-truth validation	must contain 2.2692	[2.333, 2.465] — shifted +5% (finite-size)

Interpretation

Three observations, from the most reassuring to the most nuanced.

1 · β recovered correctly

The posterior of β is centered at 0.100 and its 95% HDI [0.052, 0.148] comfortably brackets the exact value 1/8 = 0.125. This confirms that the piecewise power-law likelihood captures the critical behavior and that the sampler explores the right part of parameter space.

2 · Tc shifted upward by a finite-size effect

The posterior of Tc centers at 2.396 — about +0.13 above Onsager's exact 2.2692. This is not a bug. Finite-size scaling theory predicts that on an N×N lattice the apparent critical temperature satisfies Tc(N) − Tc(∞) ∝ 1/N. At N = 28, a shift of +0.1 to +0.2 is consistent with the literature (Ferrenberg & Landau, 1991).

To recover Onsager's value within the HDI, running on a 64×64 or 128×128 lattice would close the gap — at a proportionally higher compute cost.

3 · The sampler is healthy

All four chains reach R̂ ≤ 1.02 with ESS above 200. The trace plots (Fig 3) show no drift, no stuck regions, and good mixing across chains. Acceptance rates sit in the 31–35% band — right in the optimal range for random-walk Metropolis in 3 dimensions. If any of this failed, none of the posterior numbers above would be trustworthy.