Latin Hypercube Sampling Back
Statistics Simulator

Latin Hypercube Sampling — LHS vs Plain Monte Carlo

Side-by-side 2D Latin Hypercube and plain Monte Carlo sampling. See how placing exactly one point per row and column changes the integration accuracy at the same sample size, in real time.

Parameters
Sample size N
pts
Dimensions d
2D fixed

Fixed to 2D for visualization. LHS itself extends to arbitrary dimension d.

Random seed (LCG)
Integrand f(x,y)

0: Gaussian peak (centered) / 1: sin(πx)sin(πy) ridge / 2: high frequency sin(20πx)sin(20πy)

Random numbers are generated deterministically with an LCG. The same seed and N always produces the same samples.

Results
True value ∫∫f dxdy
LHS estimate
Plain MC estimate
|MC error| / |LHS error|
Sample Scatter (left: LHS / right: plain MC)

Background grid lines mark the LHS N×N cell partition. LHS places one point per row and column (stratified); plain MC has no stratification.

Theory & Key Formulas

For each dimension $d$, Latin Hypercube Sampling builds an independent permutation $\pi_d$ of $\{0,1,\dots,N-1\}$ and sets the $d$-th coordinate of the $i$-th sample by (with $u_{i,d}\sim U(0,1)$):

$$x_{i,d} = \frac{\pi_d(i) + u_{i,d}}{N}$$

The integral of $f$ is estimated by the sample mean:

$$\hat{I} = \frac{1}{N}\sum_{i=1}^{N} f(\mathbf{x}_i)$$

For smooth integrands, the variance of the LHS estimate drops faster than $1/N$ of plain MC, so the practical error is smaller:

$$\mathrm{Var}(\hat{I}_\text{LHS}) \le \mathrm{Var}(\hat{I}_\text{MC})$$

The test function $f_1(x,y)=\sin(\pi x)\sin(\pi y)$ has the exact integral $4/\pi^2 \approx 0.4053$. This tool uses that value as the reference to compare LHS and MC errors.

What is Latin Hypercube Sampling?

🙋
I keep seeing "LHS" mentioned around CAE. How is it different from regular Monte Carlo? We are still just throwing random points, right?
🎓
Great question. Both throw random points, but the placement rule is different. LHS partitions each dimension into N equal cells and adds the constraint "exactly one point per row and per column." Look at the left side of the simulator above — you can see one blue point in every column horizontally and every row vertically. Plain MC on the right (orange) has no such constraint, so by chance empty strips or clusters appear.
🙋
Got it! So "one per row and column" is basically a Sudoku rule. Why does that make the integration more accurate?
🎓
Nice analogy. Because the samples cover the space uniformly, the bias in the function average is smaller. Imagine a function with a big value in the upper right; if plain MC happens not to put any sample there, the estimate is underestimated. LHS guarantees at least one sample in every row and column, so such "gaps" can almost never occur. Try N=20 in the simulator and jiggle the seed a few times: the LHS estimate (blue) stays close to the true 0.4053, while MC (orange) swings widely between 0.3 and 0.5.
🙋
The error-ratio card shows "2.5×" or "5×" — does that mean LHS has a smaller error than plain MC?
🎓
Exactly. It is |MC error|/|LHS error|, so any value above 1 means LHS wins. The practical point is that this is at the same sample size. If you can afford 100 CFD runs, simply choosing LHS placement instead of plain MC gives a clearly better average or response surface — at no extra cost, just by changing the sampling strategy.
🙋
When I switch the integrand to "high frequency" (mode 2), the error ratio collapses to around 1 and the difference almost disappears. Why?
🎓
Sharp observation. LHS helps only when the integrand is smooth. A function like $\sin(20\pi x)\sin(20\pi y)$ oscillates several times inside a single cell, so the stratification benefit washes out. In CAE language, LHS is useless against "uncorrelated high-frequency noise." In practice you either trust that the response is smooth, or you fit a smooth surrogate first and then apply LHS on top of that.

Frequently Asked Questions

Both are variance-reduction techniques that cover the space more uniformly than plain Monte Carlo, but their philosophies differ. LHS is a stratified scheme that guarantees a uniform marginal in every dimension and is generated in one batch for a fixed N. Quasi-random sequences such as Sobol can be extended incrementally but offer less control over the correlation between dimensions. CAE design exploration with a fixed budget tends to favor LHS, while numerical integration of smooth functions favors Sobol.
The marginal-uniformity property holds in any dimension, but the space grows exponentially with d (the curse of dimensionality), so projections onto a 2D subspace can still look full of holes for small N. To improve high-dimensional LHS, practitioners use Maximin LHS that maximizes the minimum pairwise distance, or correlation-minimized (optimized) LHS designs.
Standard LHS fixes N up front and generates the whole batch at once; adding a single point afterwards breaks the "one point per row and column" property. When extension is needed, Nested LHS (growing N to k*N) or Refinable LHS constructions are used. They are more complex to implement, so if possible it is simpler to start with a sufficiently large N.
In expectation, LHS variance is guaranteed to be no larger than plain MC when the integrand is monotone or smooth in each dimension (McKay 1979). For highly oscillatory functions, or when N is extremely small, the gap shrinks. The "high frequency" mode of this simulator (mode 2) is exactly such a case where the error ratio hovers around 1.

Real-World Applications

Training data for response surfaces and surrogate models: When one FEM/CFD case takes hours, running hundreds of cases over the design space is not affordable. Placing 30 to 100 LHS cases uniformly and fitting polynomial regression, kriging or neural network surrogates to the results is a standard CAE workflow to accelerate design exploration and optimization.

Uncertainty quantification (UQ) and reliability analysis: LHS is used in probabilistic analysis that propagates the variability of material constants, plate thickness or loads into product performance. Combined with Monte Carlo filtering or Saltelli's sensitivity methods, it ranks the importance of each input variable with a limited number of runs.

Design of experiments (DOE): Beyond computer experiments, LHS is used in physical experiments in chemical processes or production lines when continuous variables and many factors need to be covered uniformly with a limited number of runs. Unlike classical orthogonal-array DOE, it handles continuous variables naturally.

Hyperparameter search in machine learning: LHS is used as an improved random search to explore hyperparameter spaces such as learning rate, regularization coefficient and hidden-layer size. It covers a wider space with fewer trials than grid search, and is also a popular initial sampling for Bayesian optimization.

Common Misconceptions and Cautions

The most common pitfall is to overestimate LHS as "you can use a smaller sample size without losing accuracy". What LHS reduces is the variance of the estimator, not the local resolution of the function itself. For a sharply peaked response, a 20-sample LHS still cannot pin down the peak location precisely. In the simulator, combining the Gaussian-peak integrand (mode 0) with N=20 still shows a noticeable error in the LHS estimate. Remember: "the samples become uniform" is a different statement from "the peak is detected."

The next pitfall is to confuse "uniform marginals in each dimension" with "the joint sample is uniform". LHS only guarantees the marginals; the 2D projection can still be biased. Swapping the seed a few times in the simulator, you will sometimes see points lining up along a diagonal — a residual correlation. To avoid this, advanced LHS designs add a maximin-distance or minimum-correlation criterion. This tool implements standard LHS, so depending on the seed you can also observe such biases.

Finally, the advantage of LHS depends strongly on the integrand. Mode 1 ($\sin(\pi x)\sin(\pi y)$) is exactly the smooth low-frequency case that LHS handles best; the error ratio can reach several times to more than ten. The high-frequency mode 2 erases the benefit and even lets MC win occasionally by luck. In practice it is safer to probe the integrand with a test function first, or evaluate the variance across multiple seeds. LHS is not a universal cure but an "accelerator for smooth functions."