What is the bootstrap method?

The bootstrap, introduced by Bradley Efron in 1979, is a resampling technique. From N original data points you draw B pseudo-samples with replacement (the same point may appear multiple times) and compute the statistic of interest on each one. The empirical distribution of those B values approximates the sampling distribution of the statistic, so you can estimate standard errors and confidence intervals for the mean, median, variance and many other quantities without assuming a parametric model for the population.

What does a 95% confidence interval actually mean?

A 95% confidence interval is a range such that, if the entire procedure were repeated many times, about 95% of the computed intervals would contain the true value. This tool uses the percentile method: sort the B bootstrap statistics and take the 2.5% and 97.5% quantiles. Any single interval either does or does not contain the true value; the 95% refers to the long-run frequency over repetitions.

How should I choose the number of bootstrap iterations B?

For confidence intervals, B between 1000 and 2000 is a common rule of thumb. Increasing B makes the bootstrap distribution smoother and stabilizes the CI estimate, but it does not narrow the CI: the width is set by the information in the original data (N). To shrink the interval you need to collect more data, not run more bootstraps.

Which is wider, the CI for the mean or the median?

For a symmetric distribution like the Gaussian, the standard error of the median is about 1.25 times that of the mean, so the median CI is wider. For heavy-tailed distributions such as the lognormal, however, the median is far more robust to outliers and can actually have a narrower CI than the mean. Switch the data distribution in this tool from Gaussian to lognormal to see how the relationship flips.

Bootstrap Confidence Interval Simulator — Free Online Calculator

Parameters

Original sample size N

pts

Bootstrap iterations B

runs

True mean μ

—

True σ

—

Data distribution

Random numbers are deterministic (LCG, seed=42). Large B values take slightly longer.

Results

—

Sample mean x̄

—

95% CI for mean

—

Standard error SE

—

95% CI for median

Original Data and Bootstrap Mean Distribution

Top: histogram of the original data (blue) / Bottom: distribution of B bootstrap means (green) with 2.5/97.5 percentile lines (red dashed)

Theory & Key Formulas

From original data $x_1,\ldots,x_N$, draw $B$ resamples with replacement and compute the statistic $T^*_b$ on each.

Bootstrap standard error:

$$\widehat{\mathrm{SE}}_{\text{boot}} = \sqrt{\frac{1}{B-1}\sum_{b=1}^{B}\left(T^*_b - \bar{T^*}\right)^2}$$

95% confidence interval (percentile method):

$$\left[T^*_{(0.025)},\; T^*_{(0.975)}\right]$$

Theoretical SE of the mean (Gaussian, for reference):

$$\mathrm{SE}(\bar{x}) = \frac{\sigma}{\sqrt{N}}$$

With the default $N=100,\,\sigma=2$, the theoretical $\mathrm{SE}\approx 0.20$ and the CI width is $\approx 0.78$.

What is the Bootstrap Confidence Interval Simulator

🙋

I want to put a "plus or minus" range on the mean of my survey data, but I'm not sure my data is normally distributed. Can I still compute a confidence interval?

🎓

Yes — this is exactly where the bootstrap shines. Roughly speaking, you take your N data points and resample them with replacement (the same point can be drawn many times) to build B pseudo data sets, then compute the mean on each. The B means form an empirical sampling distribution. In the simulator above with $N=100,\,B=1000$, look at the green histogram: that's an approximation of "how much would my mean vary if I could repeat the experiment 1000 times?"

🙋

Wait, just resampling the same data shows variability?

🎓

It feels strange, but because the resampling is with replacement, sometimes a point is drawn three times and another one never. That mimics drawing a fresh sample of N from the underlying population. When Bradley Efron introduced this in 1979, suddenly you could estimate standard errors for statistics that had no textbook formula, and it became a cornerstone of modern statistics.

🙋

I see — the "95% CI for mean" card shows something like [9.6, 10.4]. So is that "the true mean is in that range with 95% probability"?

🎓

Almost, but the strict reading is "if I repeated this whole procedure 100 times, about 95 of the intervals would contain the true mean." Whether any single interval is one of the lucky 95 is unknown. Also note: increasing B does not shrink the interval. Try $B=5000$ — the histogram gets smoother but the red dashed lines barely move. What shrinks the width is N. Going from $N=100$ to $N=400$ halves the width, because $\mathrm{SE}\propto 1/\sqrt{N}$.

🙋

And when I switch the data distribution to "lognormal", the CIs for mean and median end up in completely different places!

🎓

Good catch. For skewed distributions like the lognormal, the mean is dragged by extreme values while the median is hardly affected. In real work — incomes, house prices, time-to-failure — the median's bootstrap CI is the go-to summary. The big practical win of the bootstrap is that the recipe stays exactly the same no matter what statistic you want, even when no closed-form SE formula exists.

Frequently Asked Questions

Sampling with replacement puts each drawn point back before drawing the next, so the same point can be picked multiple times. If you draw N points with replacement from N, roughly 63% of the originals appear at least once and 37% are never picked. That randomness is what creates variability between resamples. Sampling without replacement would always reproduce the original data exactly, so the bootstrap always uses with-replacement sampling.

With N around 10 to 20, your original data may not represent the population well, and the bootstrap distribution can deviate substantially from the true sampling distribution. This is especially problematic for heavy-tailed data and for the median, where small N produces a narrow but unreliable interval — an overconfidence trap. As a rule of thumb, use at least N=30 and prefer N=100 or more. Drop N to 20 in the tool to see the resample distribution become step-like and the CI jitter visibly.

This tool uses the percentile method (read the 2.5 and 97.5 percentiles of the bootstrap distribution directly), which is the simplest to implement. The BCa method (bias-corrected and accelerated) adjusts for bias and skewness and is more accurate. The bootstrap-t method recomputes the standard error on every resample to build a t-like distribution. BCa is the standard in modern practice, but for roughly symmetric distributions the percentile method gives quite usable results.

Yes. It is used to estimate confidence bounds on fatigue strength from a small number of material test specimens, or to put CIs on design quantities derived from a handful of CFD/FEM runs. In Monte Carlo reliability analysis it is particularly useful for putting CIs on quantiles (the 90th or 99th percentile response), where no clean parametric formula exists. Parameter estimates of Weibull and similar reliability distributions also commonly come with bootstrap confidence intervals.

Real-World Applications

Medical and clinical statistics: Clinical trials often have limited sample sizes and unknown distributional shapes, so the bootstrap is widely used to compute CIs for median survival time, between-group differences and hazard ratios. The confidence bands on Kaplan-Meier curves and asymmetric CIs for hazard ratios are typical examples.

Evaluating machine learning models: Accuracy, AUC, F1 and other test-set metrics are routinely bootstrapped to give CIs on model performance and to test whether two models differ significantly. The ".632 bootstrap", combined with cross-validation, is a standard technique for estimating generalization error on small data sets.

Risk evaluation in finance: Pseudo future scenarios are generated by resampling from historical return series, yielding CIs on Value-at-Risk and Expected Shortfall. Because no normal assumption is required, the heavy-tailed risks that markets exhibit in practice are not systematically under-estimated.

Extreme-event analysis (earthquakes, weather): When estimating the once-in-100-years event from limited observations, bootstrap CIs are placed on the parameters of extreme-value distributions. Even for complex statistics with no closed-form SE, the same bootstrap recipe still yields interval estimates.

Common Misconceptions and Cautions

The most common mistake is to believe that "increasing B will shrink the CI". Raising the number of bootstrap iterations smooths the resample distribution and stabilizes the CI estimate, but the width is set by the information in the original data N. Keep $N=100$ in this tool and vary $B$ from 100 to 5000: the green histogram becomes smoother while the red dashed CI bounds barely move. Increasing $N$ from 100 to 400, on the other hand, halves the CI width because $\mathrm{SE}\propto 1/\sqrt{N}$.

The next common mistake is interpreting "95% CI" as "there is a 95% probability the true value is inside this particular interval". Strictly speaking, the frequentist statement is "if I repeated the entire procedure many times, 95% of the resulting intervals would contain the true value". Saying "the true value is in this specific interval with 95% probability" belongs to Bayesian credible intervals, which are conceptually different. In casual usage both feel similar, but technical reports should respect the distinction.

Finally, do not assume the "bootstrap always works like magic". With extremely small N (below 20), with extreme-value statistics (sample max or min) and with the variance of heavy-tailed distributions, the bootstrap distribution can drift far from the true sampling distribution. Switch this tool's data to lognormal and lower N to 20: the mean's CI becomes visibly asymmetric and swings substantially with the seed. In such hard cases more advanced approaches such as BCa or parametric bootstrap should be considered.

Bootstrap Confidence Interval Simulator

What is the Bootstrap Confidence Interval Simulator

Frequently Asked Questions

Real-World Applications

Common Misconceptions and Cautions

Related Tools