Rejection Sampling Simulator Back
Statistics Simulator

Rejection Sampling Simulator — Monte Carlo Sample Generation

When direct sampling from a target p(x) is hard, cover it with a proposal q(x) and an envelope M*q(x) >= p(x), then accept each candidate with probability p(x)/(M*q(x)). This page visualizes the classical Monte Carlo trick step by step.

Parameters
Number of trials N
Envelope constant M
Target distribution

0: Beta(2,5) unimodal asymmetric / 1: bimodal mixture hard to sample

Proposal Gaussian width σ (used for type=1)

Random numbers come from a deterministic LCG with fixed seed 42. "Resample" advances the seed.

Results
Number of trials N
Accepted samples
Acceptance rate
Theoretical max efficiency 1/M
Target p(x), envelope M*q(x) and accept/reject points

Blue solid = target p(x) / green dashed = envelope M*q(x) / green dots = accepted / red crosses = rejected

Histogram of accepted samples vs. p(x)

Blue bars = normalized histogram of accepted samples / white line = target p(x)

Theory & Key Formulas

Rejection sampling picks a proposal q(x) and a constant M with M >= sup p(x)/q(x), draws X ~ q(x) and U ~ U(0,1), and accepts when U < p(X)/(M*q(X)).

Acceptance probability of any trial:

$$P(\text{accept}) = \int \frac{p(x)}{M\,q(x)}\,q(x)\,dx = \frac{1}{M}\int p(x)\,dx = \frac{1}{M}$$

Conditional distribution of accepted samples:

$$f_{X\,|\,\text{accept}}(x) = p(x)$$

Expected number of accepted samples:

$$\mathbb{E}[N_{\text{accept}}] = \frac{N}{M}$$

A larger M makes the envelope safer to satisfy but lowers the acceptance rate proportionally to 1/M, increasing computational cost.

What is the Rejection Sampling Simulator

🙋
Doesn't a computer just generate random numbers from any distribution? Why do we need this "rejection" thing?
🎓
Good question. Distributions like uniform or normal can be sampled directly because their inverse CDF is known, but a Bayesian posterior, for instance, often has an unknown normalizing constant or a complicated shape. Rejection sampling sidesteps that by laying an easy-to-sample proposal q(x) on top and keeping only the "hits". The default above covers Beta(2,5) with a uniform proposal.
🙋
So the green dashed line is M*q(x) and the blue solid is the target. Why multiply by M?
🎓
Because M*q(x) >= p(x) must hold for every x — that is, the proposal pushed up forms a "ceiling" that fully covers the blue hill. The peak of Beta(2,5) at x=0.2 is about 2.458, so with q=1 (uniform) you need M >= 2.458. The default M = 3.0 is safely above that. Drop M to 2 and you will see the blue curve poke through the green ceiling, breaking the method.
🙋
The acceptance rate card shows about 33%, which equals 1/M. Coincidence?
🎓
Not a coincidence — it is the central theorem. Integrating the acceptance probability gives integral p(x)/(M*q(x)) * q(x) dx = (1/M) integral p(x) dx = 1/M. So with N = 2000 trials you accept about 666 (rate about 33%). At M = 5 the rate drops to 20%, at M = 10 to 10% — strictly linear decay in 1/M. Compare the "theoretical max efficiency 1/M" and "acceptance rate" cards in the simulator and they line up almost exactly.
🙋
I switched the target to 1 (bimodal). The acceptance rate plummeted!
🎓
Right. A bimodal or complex shape forces a much larger M to be safely covered by a Gaussian proposal. A small proposal width sigma leaves the tails uncovered; a large one wastes mass on empty regions. In practice this gets worse with dimension — the "curse of dimensionality" pushes acceptance toward zero exponentially. That is exactly why MCMC, Hamiltonian Monte Carlo and friends were invented. Rejection sampling is the textbook starting point that also teaches you its own limits.

Frequently Asked Questions

The theoretical optimum is M = sup_x p(x)/q(x), which maximizes the acceptance rate. For Beta(2,5) with a uniform proposal the supremum is at the mode x = 0.2 and equals 2.458, so M = 2.458 is optimal. In code you either differentiate p/q analytically to find the supremum, evaluate on a fine grid, or — when the shape is unknown — use adaptive rejection sampling (ARS), which builds and updates the envelope on the fly.
Yes. If p(x) = p_tilde(x) / Z with Z unknown, choose M as the supremum of p_tilde(x)/q(x) and use the acceptance probability p_tilde(x)/(M*q(x)). The accepted samples still follow p(x) exactly. This is especially useful for Bayesian posteriors (likelihood times prior, with Z unknown) and is the conceptual seed of Metropolis–Hastings.
In high dimensions the mass of p(x) concentrates in a thin region, and matching that with a tractable q(x) becomes extremely hard. The supremum of p/q grows exponentially with the dimension d, so the acceptance rate 1/M decays exponentially. Even covering a 10-dimensional Gaussian target with a Gaussian whose mean is slightly off can drive M to astronomical values. In high dimensions one must switch to MCMC, variational inference or HMC.
Compute the conditional distribution given the accept event. With X ~ q and U < p(X)/(M*q(X)), the conditional density of X is q(x) * [p(x)/(M*q(x))] / (1/M) = p(x), which is exactly the target. Geometrically, M*q(x) >= p(x) guarantees that uniformly scattered points under the envelope, sliced vertically, recover p(x).

Real-World Applications

Bayesian statistics and probabilistic modeling: Posterior distributions p(theta|D) ∝ p(D|theta)p(theta) usually lack a closed-form normalizing constant, so rejection sampling is used to draw parameter samples theta. It is widely used for low- to mid-dimensional models and as a warm-up for more advanced MCMC, and is a building block of modern Bayesian inference.

Computational physics and molecular simulation: In weighted Monte Carlo methods that evaluate macroscopic quantities from samples of a probability distribution, rejection sampling is used to draw configurations from a Boltzmann distribution p(x) ∝ exp(-E(x)/kT). It is effective when the energy landscape is simple and underpins the theoretical justification of Metropolis-style algorithms.

CAE and reliability analysis: In product reliability evaluation, input parameters (material strength, geometric tolerances) are sampled from their distributions and responses are computed with finite elements. When input distributions are non-standard, rejection sampling generates the inputs, and the resulting response distribution gives the failure probability.

Machine learning and generative models: Rejection sampling is used in evaluating normalizing constants, sampling from energy-based models and estimating rare events. The concept also appears in the theoretical analysis of GANs and diffusion models and is part of the toolkit for understanding modern generative methods.

Common Misconceptions and Cautions

The most common misconception is to think that "as long as samples are eventually produced, the quality is the same regardless of acceptance rate". The accepted samples do follow p(x) exactly in theory, but if the rate 1/M is extremely low, obtaining the desired number of samples costs an enormous number of trials. For Beta(2,5) with M = 10 the acceptance rate is 10%, meaning you need roughly four times as many trials as with M = 2.458. Pick M as close to the supremum as you can; in the simulator slide M while watching the "acceptance rate" card to feel where the sweet spot is.

The next most common mistake is to under-think the choice of the proposal q(x). If q has a very different shape from p, then even when M*q(x) >= p(x) holds the bulk of proposals land in "wasteful" regions far from the support of p and almost all are rejected. In the simulator set the target to 1 (bimodal) and the proposal width to 0.5: proposals pile up around the center and the two modes are starved. A good proposal is "shaped like p, with tails at least as wide".

Finally, understand what happens if M*q(x) >= p(x) is violated. The inequality must hold for every x; if it fails at even a single point, the acceptance probability exceeds 1 there and the resulting samples are systematically biased away from p(x). In the simulator, with target = Beta(2,5) and M = 2, the blue curve breaks through the green dashed ceiling near the peak and the histogram visibly underestimates the mode. In code, evaluate sup p/q on a fine grid or with numerical optimization and apply a safety factor before settling on M.