Gibbs Sampling Simulator

Draw samples from a bivariate normal distribution with Gibbs sampling, the classic MCMC (Markov chain Monte Carlo) method. Adjust the target correlation, sample count and burn-in to watch the staircase path of alternating conditional draws and see the sample correlation converge to the target value in real time.

Parameters

Target correlation ρ

Correlation of x and y in the target bivariate normal

Sample count N

Number of chain states kept after burn-in

Burn-in

Initial biased iterations discarded as warm-up

Random seed

The same value reproduces the result exactly

Results

—

Sample mean x

—

Sample mean y

—

Sample correlation ρ̂

—

Sample std. deviation

—

Collected samples

—

Error vs target

—

Sampling path — staircase walk

Dots are the collected (x,y) samples; the faint ellipse is the target-correlation contour. The line connecting consecutive states "horizontal then vertical" is the characteristic Gibbs staircase walk, and the marker walks along it.

Trace plot — value of x vs iteration

Marginal histogram of x

Theory & Key Formulas

$$x^{(t+1)}\sim\mathcal N(\rho\,y^{(t)},\,1-\rho^2),\qquad y^{(t+1)}\sim\mathcal N(\rho\,x^{(t+1)},\,1-\rho^2)$$

Gibbs sampling draws each variable from its full conditional distribution, in turn. For a standard bivariate normal the conditionals are also normal, with mean equal to ρ times the other variable and variance 1−ρ².

$$\hat\rho = \frac{\sum_t (x_t-\bar x)(y_t-\bar y)}{\sqrt{\sum_t (x_t-\bar x)^2}\,\sqrt{\sum_t (y_t-\bar y)^2}}$$

The sample correlation ρ̂ computed from the collected samples. As the sample count N grows it approaches the target ρ, with the Monte Carlo error shrinking roughly as 1/√N.

What is Gibbs Sampling?

🙋

I've heard the name "Gibbs sampling"... but what does it actually do? It generates random numbers, right?

🎓

Yes — it's a tool for generating random numbers that follow a given probability distribution. Ordinary random numbers are easy when you have a formula, like a uniform or a normal. But once several variables are tangled together into a complex joint distribution, sampling directly from that formula gets hard. Gibbs sampling takes a detour: it redraws one variable at a time from its conditional distribution given all the others. Repeat that endlessly, and before you know it the whole stream of points follows the target joint distribution.

🙋

Just updating one at a time really converges to the right distribution? That feels mysterious.

🎓

It does feel like magic. It's guaranteed by the theory of Markov chain Monte Carlo (MCMC). If you build an update rule that alternates the conditional distributions, the stationary distribution of that chain turns out to be exactly the target joint distribution. So run it long enough and the points are samples from the target. The example in this tool is a bivariate normal: given x, the variable y is normal with mean ρx and variance 1−ρ², and given y, x has the same form. Move the target correlation ρ on the left and watch the scatter plot at the top right stretch into an ellipse.

🙋

The path graph looks like a jagged staircase. What is that?

🎓

That's like the visual "fingerprint" of Gibbs sampling. Each step moves only one variable, right? Fix y and draw x, and the point moves straight sideways. Then fix x and draw y, and it moves straight up or down. Sideways, vertical, sideways, vertical — so the path is always a staircase. Try making ρ strong, like ±0.9. The conditional spread 1−ρ² narrows, each step shrinks, the staircase gets finer, and the chain struggles to roam across the whole distribution. That's "poor mixing".

🙋

There's a burn-in setting too. Throwing away the first few samples — isn't that wasteful?

🎓

I get the feeling, but this is a necessary discard. The chain is forced to start at the origin (0,0). The first few dozen steps are still "warming up" — not yet settled into the target — and mixing them into your statistics pulls the sample mean and correlation towards the starting value. So you discard that initial segment as burn-in. The stronger the correlation, the longer the warm-up lasts, so you take a longer burn-in. Set burn-in to 0 and ρ to 0.95 on the left, and you'll see the error get a little worse.

🙋

Is that also why the sample correlation ρ̂ never exactly equals the target ρ?

🎓

Exactly — that's the destiny of any Monte Carlo method. Computing from a finite set of samples always leaves an error in ρ̂. Increase the sample count N and the error shrinks roughly as 1/√N. But MCMC samples next to each other are similar (they're autocorrelated), so the "effective sample size" is smaller than the nominal count and convergence is a bit slower. In practice you manage the error by raising N, taking an adequate burn-in, and sometimes thinning the chain while watching the autocorrelation.

Frequently Asked Questions

Gibbs sampling is a Markov chain Monte Carlo (MCMC) method for drawing samples from a multivariate probability distribution. Even when you cannot sample the target joint distribution directly, you can run Gibbs sampling as long as you know each variable's full conditional distribution — its distribution given all the others. You simply update one variable at a time by drawing from that conditional distribution; run long enough, and the whole chain follows the target joint distribution. This tool visualises the procedure on a bivariate normal.

A Gibbs sampler starts from an arbitrary initial point — the origin (0,0) in this tool — so the first tens to hundreds of steps are a warm-up phase still drifting away from the target. Discarding this biased initial segment is called burn-in. If you keep it and compute statistics anyway, the sample mean and sample correlation are pulled towards the starting value. The stronger the correlation ρ (e.g. ±0.9), the slower the chain moves and the longer it takes to mix, so a longer burn-in is required.

Gibbs sampling updates only one variable per step. First it fixes y and redraws x, so the point moves horizontally; then it fixes x and redraws y, so the point moves vertically. This horizontal-then-vertical pattern repeated over and over traces a characteristic staircase (zig-zag) path on the (x,y) plane. The stronger the correlation, the narrower the conditional spread 1−ρ², so each step is smaller and the staircase gets finer — which is exactly what slow mixing looks like.

Gibbs sampling is a Monte Carlo method, so the sample correlation ρ̂ computed from a finite set of samples always carries Monte Carlo error. The more samples you draw, the closer ρ̂ gets to the target ρ, with the error shrinking roughly as 1/√N. Because MCMC samples are also autocorrelated, the effective sample size is smaller than the nominal count, so convergence is somewhat slower than for independent draws. To reduce the error, increase the sample count and take an adequate burn-in.

Real-World Applications

Bayesian parameter estimation: The most widespread use of Gibbs sampling is Bayesian inference. The posterior distributions of hierarchical and regression models often cannot be written analytically, and it is common to know only each parameter's full conditional distribution. With conjugate priors the conditionals become standard distributions (normal, gamma, inverse-gamma, etc.), and a Gibbs sampler can draw a large number of samples from the posterior to estimate posterior means and credible intervals. The classic Bayesian packages BUGS and JAGS are built on exactly this approach.

Image processing and Markov random fields: In denoising and image segmentation, each pixel's label depends on its neighbours and is modelled as a Markov random field (MRF). The joint distribution over all pixels is enormous, but the conditional distribution of a single pixel depends only on its neighbourhood, making it cheap to compute and an excellent fit for Gibbs sampling. In fact the term "Gibbs sampling" itself became popular in the image-restoration context (Geman & Geman, 1984).

Topic models (LDA) and natural language processing: Latent Dirichlet Allocation (LDA), which extracts latent topics from a collection of documents, estimates the topic assignment of each word with collapsed Gibbs sampling. The conditional probability can be written using only word, document and topic counts, which makes it easy to implement even on large text corpora.

Imputation of missing data: Survey and sensor data inevitably contain missing values. In data augmentation, the missing values themselves are treated as unknown parameters and filled in with Gibbs sampling from the distribution conditioned on the observed data. It also serves as the core algorithm of multiple imputation.

Common Misconceptions and Pitfalls

The biggest misconception is assuming the samples are independent. What Gibbs sampling produces is a Markov chain, and adjacent samples are strongly correlated. When the target correlation ρ is strong in particular, the conditional spread 1−ρ² is narrow and the chain only inches across the distribution. Even with N=1000 samples, the "effective sample size" in independent-sample terms may be a fraction of that. Estimating the standard error with the independent-sample formula then underestimates it. Always check the autocorrelation function and the effective sample size (ESS).

Next is the false comfort that "as long as you take a burn-in, convergence is guaranteed". Burn-in only discards the influence of the starting value; whether the chain actually explored the whole target distribution is a separate question. For a multimodal distribution with several peaks, for instance, a Gibbs sampler can get trapped on one peak for a long time and never reach the others. The bivariate normal in this tool is unimodal so it does not happen here, but for complex real-world models the rule is to run chains from multiple starting points and diagnose convergence by whether the results agree (e.g. the Gelman–Rubin R̂).

Finally, Gibbs sampling is not an all-purpose tool you can always use. Gibbs sampling only works if you can actually sample from the full conditional of every variable. When the conditional has a non-standard form that is hard to sample, or when the variables are so strongly correlated that mixing is hopelessly slow, other methods — Metropolis–Hastings, Hamiltonian Monte Carlo (HMC), or blocked Gibbs — are more efficient. Remember that the method should be chosen to match the structure of the target distribution.

Gibbs Sampling Simulator

What is Gibbs Sampling?

Frequently Asked Questions

Real-World Applications

Common Misconceptions and Pitfalls

How to Use

Worked Example

Practical Notes