What is the Metropolis-Hastings algorithm?

Metropolis-Hastings is an MCMC algorithm that draws samples from a target density p(x) that is hard to sample directly. It proposes a candidate x' from a proposal q(x'|x) and accepts it with probability alpha = min(1, p(x')/p(x)) for a symmetric proposal. Because only the ratio of densities is required, the normalizing constant cancels — making the method ideal for Bayesian posteriors and statistical-mechanics distributions.

How should I choose the proposal variance sigma_prop?

If sigma_prop is too small the chain accepts almost every move but takes tiny steps, so autocorrelation is high and effective sample size is low. If it is too large most proposals are rejected and the chain stalls. The classical rule of thumb (Roberts, Gelman, Gilks 1997) is to tune sigma_prop so the acceptance rate is about 44% in one dimension and 23% in high dimensions.

What is burn-in and why discard it?

Burn-in is the number of initial samples discarded so that the Markov chain has time to forget its starting point and approach its stationary distribution. Estimates computed from the post-burn-in samples are less biased by the initial value. Burn-in should be long enough that the trace plot looks stationary; common choices are 10-50% of the total chain length.

What is effective sample size (ESS)?

Because successive MCMC samples are correlated, N consecutive samples contain less information than N independent draws. The effective sample size ESS = N / (1 + 2 sum rho_k) measures the equivalent number of independent samples, where rho_k is the lag-k autocorrelation. This tool uses the simplified estimate ESS ≈ N (1 − 2 rho_1 / (1 + rho_1)) for quick diagnostics.

MCMC Metropolis-Hastings Sampler — Free Online Tool

Parameters

Sample count N

—

Proposal variance sigma_prop

—

Target distribution type

0: Unimodal Gaussian N(0, 1) / 1: Bimodal mixture 0.4·N(−2, 0.5²) + 0.6·N(2, 0.7²)

Burn-in length

—

Random numbers come from a fixed-seed LCG (seed=42), so the same settings always reproduce the same chain.

Results

—

Estimated mean μ̂

—

Estimated std σ̂

—

Acceptance rate

—

Effective sample size

Sample Chain, Histogram, and Autocorrelation

Top = trace x_t (blue) / Middle = histogram (green) and true density (yellow line) / Bottom = autocorrelation rho_k (lags 1..50)

Theory & Key Formulas

Metropolis-Hastings proposes a candidate $x'$ from a Gaussian proposal $q$ and accepts or rejects it with probability $\alpha$. When $q$ is symmetric the ratio $q(x|x')/q(x'|x)$ cancels and only the target density ratio remains.

Proposal:

$$x' \sim \mathcal{N}(x_t, \sigma_\text{prop}^2)$$

Acceptance probability:

$$\alpha = \min\!\left(1, \frac{\tilde p(x')}{\tilde p(x_t)}\right)$$

Update rule (with $u \sim U(0,1)$):

$$x_{t+1} = \begin{cases} x' & (u < \alpha) \\ x_t & (\text{otherwise}) \end{cases}$$

$\tilde p(x)$ does not need to be normalized — only the ratio is used. For a 1D Gaussian target, $\sigma_\text{prop}$ should be tuned so the acceptance rate is about 44%.

What is the MCMC Metropolis-Hastings Sampler?

🙋

If we just want random samples from a distribution, why do we need this complicated Markov chain idea? Can't we just generate them directly?

🎓

Basically, for many real distributions you can't sample directly because the normalizing constant — the integral that makes the density integrate to 1 — is too hard to compute. Bayesian posteriors are the canonical example. Metropolis-Hastings sidesteps this by only using the ratio $p(x')/p(x)$, where the normalizing constant cancels out. Try setting "Target distribution type" to 1 (bimodal mixture) and watch the histogram slowly fill in both peaks.

🙋

When I set sigma_prop to 0.1, the trace looks almost like a straight line and the histogram only covers one peak. Is the simulator broken?

🎓

Not broken — this is the classic MCMC "mixing" problem. When sigma_prop is too small the chain crawls one tiny step at a time, so it takes thousands of iterations to climb out of the valley between modes. Check the acceptance rate card: it's probably above 95%, but that just means "we accept almost everything but barely move". If you crank sigma_prop up to 5 you'll see the opposite problem — most proposals are rejected and the chain freezes. The sweet spot is around 30–50% acceptance.

🙋

Okay so if I tune sigma_prop right, I get perfect samples?

🎓

Almost — there's one more catch. Consecutive samples are correlated, so N=2000 samples carry less information than 2000 independent draws. That's what the "Effective sample size" (ESS) card measures, and you can see it in the autocorrelation bars at the bottom: the faster they decay to zero, the larger the ESS. In CAE uncertainty quantification we usually decide on a target ESS first, then choose the chain length. Try moving sigma_prop away from the optimum and watch how quickly ESS drops.

Frequently Asked Questions

This is the largest acceptance probability that still satisfies the detailed-balance condition p(x) q(x'|x) alpha(x,x') = p(x') q(x|x') alpha(x',x), guaranteeing that p(x) is the stationary distribution of the chain. When the proposal q is symmetric the proposal ratio cancels and only the target-density ratio remains — that's the magic that lets MCMC work even when the normalizing constant is unknown.

Roberts, Gelman and Gilks (1997) studied scaling limits of random-walk Metropolis for Gaussian targets and showed that the acceptance rate maximizing effective sample size approaches 0.234 in high dimensions, while for a 1D target the optimum is about 0.44. Acceptance much higher than this means tiny steps and poor mixing; much lower means too many rejections. Aim for 0.2–0.5 in practice and adjust sigma_prop accordingly.

If the density between the two modes is very low, almost every proposal jumping into that region is rejected, so the chain effectively cannot cross the valley. Remedies include using a larger sigma_prop to take bigger jumps, running several parallel chains from different initial values, or using tempered methods such as parallel tempering / replica-exchange MCMC where temperatures flatten the valleys.

Inspect the trace plot and discard everything before the chain looks stationary (oscillating around a stable level). This tool uses 200 by default, which is plenty for the included targets, but complex distributions or poor initializations may need more. Common quantitative diagnostics include the Gelman-Rubin R-hat across multiple chains and a few autocorrelation times of buffer.

Real-World Applications

Bayesian inference and parameter estimation: MCMC is the workhorse for sampling from a posterior $p(\theta|D) \propto p(D|\theta)p(\theta)$ when the marginal likelihood is intractable. Whether you are estimating material elastic constants from experimental data or calibrating a climate model, libraries such as Stan, PyMC, and emcee rely on variants of Metropolis-Hastings or its descendants (HMC, NUTS).

Statistical mechanics and Ising models: Metropolis et al. (1953) invented this algorithm precisely to sample Boltzmann distributions $p \propto e^{-E/kT}$ in statistical mechanics. It is still the default method for studying phase transitions of spin systems, order-disorder transitions in alloys, protein folding, and many other equilibrium ensembles.

CAE uncertainty quantification: When FEM input parameters (material constants, boundary conditions, geometric tolerances) follow probability distributions, MCMC can sample the posterior of outputs such as peak stress or natural frequency. Compared with FORM/SORM, MCMC handles complex failure regions and multi-modal distributions, and modern workflows combine it with surrogate models (Kriging, Gaussian processes) to keep the cost down.

Machine learning and probabilistic programming: Latent-variable models, mixture models, and hierarchical Bayesian models — all of which have multi-modal likelihoods — rely on MCMC for training. Specific applications include collapsed Gibbs samplers for topic models (LDA), posterior sampling of Bayesian neural network weights, and posterior policy evaluation in reinforcement learning.

Common Misconceptions and Points to Note

The most common mistake is to assume that a higher acceptance rate always means a better chain. In this simulator, sigma_prop=0.1 yields an acceptance rate above 95%, but the chain barely moves — sample quality is actually terrible. Conversely, sigma_prop=5 drops the rate near 10% and the chain freezes. Aim for roughly 40–50% in 1D and 20–30% in higher dimensions; the default sigma_prop=1.0 here is already close to optimal for the bimodal target.

The second pitfall is treating the chain length N as if it were the effective sample size. MCMC samples $x_1, x_2, \ldots, x_N$ are correlated. If the autocorrelation time is $\tau=20$, an N=2000 chain carries information equivalent to only about 100 independent draws. The ESS card in this tool gives a quick estimate. Reporting only N in a paper is not enough — always quote ESS (and $\hat R$ from multiple chains) so readers can judge convergence.

Finally, do not assume that discarding burn-in guarantees convergence. Burn-in removes the influence of the starting value but cannot prove the chain has explored the whole target. For multi-modal distributions, a chain trapped in one mode will produce a nicely stationary trace yet completely miss the other modes. Best practice is to combine multiple chains from different starting points, check $\hat R$ values close to 1, and visually inspect the trace and histogram — exactly the diagnostics this tool exposes.

MCMC Metropolis-Hastings Sampler — Acceptance & Autocorrelation

What is the MCMC Metropolis-Hastings Sampler?

Frequently Asked Questions

Real-World Applications

Common Misconceptions and Points to Note

Related Tools