Importance Sampling Simulator Back
Statistics Simulator

Importance Sampling — Efficient Tail Probability Estimation

Estimate the tail probability P(X>t) of the standard normal in parallel with crude Monte Carlo and importance sampling. Tune the proposal mean and standard deviation and feel the variance-reduction effect in real time.

Parameters
Tail threshold t
σ
Sample size N
Proposal mean μ
Proposal std σ

Random numbers use an LCG with fixed seed=42 and Box-Muller transforms, so results are reproducible for any given setting.

Results
Exact P(X>t)
Crude MC estimate
IS estimate
Variance reduction Var_MC/Var_IS
Target p, proposal q, and estimator convergence

Top: blue = target N(0,1), green = proposal q(x), red dashed = threshold t, red shading = tail x>t. Bottom: blue = crude MC estimate, red = IS estimate, green dashed = exact value.

Theory & Key Formulas

The expectation under the target $p$ can be evaluated with weighted samples from a proposal $q$:

$$I = \mathbb{E}_p[h(X)] = \int h(x)\,p(x)\,dx = \mathbb{E}_q\!\left[h(X)\,\frac{p(X)}{q(X)}\right]$$

Drawing $N$ samples $X_i$ from $q$ and using the weights $w(X_i)=p(X_i)/q(X_i)$ gives the importance sampling estimator:

$$\hat I_{\text{IS}} = \frac{1}{N}\sum_{i=1}^{N} h(X_i)\,w(X_i),\quad X_i \sim q$$

For the tail probability $P(X>t)$ we take $h(x)=\mathbf 1(x>t)$, $p=\mathcal N(0,1)$, $q=\mathcal N(\mu,\sigma^2)$. The weight becomes

$$w(x)=\frac{\sigma\exp(-x^2/2)}{\exp(-(x-\mu)^2/(2\sigma^2))}$$

Choosing $\mu\approx t$ places almost every sample in the tail, and the variance reduction ratio $\operatorname{Var}(\hat I_{\text{MC}})/\operatorname{Var}(\hat I_{\text{IS}})$ ranges from a handful to a thousand.

What is the Importance Sampling Simulator?

🙋
What do you mean by a "tail probability"? With plain Monte Carlo I just draw 10000 random numbers and count, right? That should give me any probability I want.
🎓
Roughly speaking, a tail probability is the probability of a rare event. For example, the probability that a standard normal exceeds 3σ is about 0.00135 — out of 10000 samples only about 13 hit. Look at the "Crude MC estimate" card with t=3. It wobbles around quite a bit and changes every time. The relative error is roughly 1/√13 ≈ 28%.
🙋
28%! That's a lot. Can't I just take more samples?
🎓
You can, but at t=5 (5σ) the true probability is about 3×10⁻⁷, so even 1,000,000 samples only give you about 3 hits on average. The required sample size grows exponentially with t. That's where importance sampling (IS) comes in. Keep the proposal at μ=3 and look at the "IS estimate" card. It is much more stable than crude Monte Carlo.
🙋
Right, the green curve is pulled to the tail. How does that actually work?
🎓
We sample from a proposal q(x) that concentrates in the region we care about (the tail), then correct with the weight w(x)=p(x)/q(x). Because ∫h(x)p(x)dx equals ∫h(x)w(x)q(x)dx, the expectation is preserved for any proposal. Watch the "Variance reduction" card: with default settings it should be above 10, and even higher when μ matches t.
🙋
So larger μ is always better? Wait — I set μ=5 and the variance ratio dropped!
🎓
That's the trap of importance sampling. If μ is much larger than t, samples really do fall in the tail, but the weight w(x) becomes very heavy-tailed itself. The theoretical optimum sits near μ ≈ t, and you can confirm in the simulator that the variance reduction peaks around μ=t. Picking a good proposal is itself an active research area in practice.

Frequently asked questions

Multiply and divide by q(x) inside the integral: E_p[h(X)] = ∫h(x)p(x)dx = ∫h(x)·{p(x)/q(x)}·q(x)dx = E_q[h(X)·p(X)/q(X)]. Formally any q works, but if q(x)=0 in a region where p(x)·h(x)≠0 the estimator is invalid, so the support of q must cover the relevant part of p.
A variance reduction Var_MC/Var_IS of 10 means you need only one tenth of the samples for the same accuracy: 100,000 crude MC samples are replaced by 10,000 IS samples. A ratio of 100 means another tenfold gain. For very rare events the ratio can exceed 1000 and reduce wall-clock time by orders of magnitude.
That is the classic failure mode of "weight explosion". A few samples with huge weight dominate the estimator and, while a value comes out, the variance is catastrophic. Countermeasures include bringing the proposal closer to the target (re-tuning μ and σ), using self-normalized importance sampling (SIS) and monitoring the effective sample size ESS = (Σw)² / Σw².
Yes — it is the workhorse of structural reliability analysis. When the failure probability drops below 10⁻⁴, crude Monte Carlo coupled with FEM becomes impractical. Design-point importance sampling, which places the proposal near the most likely failure point, is widely used together with FORM and SORM.

Real-world applications

Structural reliability analysis: For bridges, aircraft and nuclear plants, failure probabilities of 10⁻⁴ to 10⁻⁹ make crude Monte Carlo essentially infeasible. Combined with FORM/SORM and design-point importance sampling (DP-IS), FEM is run thousands of times to estimate failure probabilities accurately. The reliability indices β used in Japanese highway-bridge codes and ASME standards are computed this way.

Financial risk management: The large-loss events that drive VaR (Value at Risk) and CVaR (Conditional VaR) live in the tail of the loss distribution. By skewing the proposal toward market-crash scenarios, the number of paths needed drops by a factor of 1000 or more, making real-time risk calculations practical. It is a basic tool of quantitative finance.

Bit error rate estimation in communications: Digital communication systems often target BERs of 10⁻⁹, which would require tens of billions of bits in direct simulation. Concentrating the noise samples in the error-prone tail with importance sampling brings the simulation time down to hours and is used routinely to design modulation and forward error correction schemes.

Bayesian statistics and particle filters: Sequential importance resampling (SIR) and sequential Monte Carlo / particle filters are built on importance sampling. When direct sampling from the posterior is impossible, a proposal proportional to the likelihood is used and corrected by weights. Self-driving car state estimation and robotic localization rely on the same idea.

Common misconceptions and pitfalls

The most common misconception is the belief that the proposal should be as far from the target as possible. Try increasing μ from the default 3.0 to 5.0 in the simulator. The samples do concentrate in the tail, but the variance reduction ratio actually drops. This is because the weight w(x)=p(x)/q(x) becomes very large for a few samples, causing the so-called weight explosion. The theoretical sweet spot is near μ ≈ t (the tail threshold); going too far in either direction destroys efficiency.

The next pitfall is to assume that simply increasing N will always make importance sampling beat crude Monte Carlo. If the proposal is poorly chosen, the variance does not shrink with N and a handful of outlier samples can swing the estimate wildly. Setting σ=0.5 and μ=5 in the simulator makes the estimator visibly unstable. In practice you should monitor the effective sample size ESS = (Σw_i)² / Σw_i² and rethink the proposal when ESS drops below a few percent of N.

Finally, remember that importance sampling is not magic. It can reduce variance by orders of magnitude, but only by maximizing the information per sample; the underlying computational cost does not vanish. Designing a good proposal requires knowledge of the target distribution and integrand, and a poor choice can make things worse than crude Monte Carlo. Simple tail problems have known optimal μ, but real-world problems often need adaptive importance sampling (AIS) or the cross-entropy (CE) method to learn the proposal itself.