What is Latin Hypercube Sampling (LHS)?

Latin Hypercube Sampling is a stratified sampling scheme that partitions each dimension into N equal cells and places N points so that each row and each column contains exactly one point. Unlike plain Monte Carlo, samples cannot cluster by chance, so the same sample size covers the space more uniformly and integration or sensitivity studies converge faster.

How does LHS compare with plain Monte Carlo?

Plain Monte Carlo draws each dimension independently and uniformly, so for a small N a part of the space may be left empty by chance. LHS guarantees a uniform marginal distribution in every dimension, which makes the variance of the estimate substantially smaller for smooth functions. In many cases the integration error of LHS is several times smaller than that of plain MC at the same N.

How is LHS used in CAE and sensitivity analysis?

LHS is standard for probabilistic analysis that handles the uncertainty of design variables such as thicknesses, material constants and geometric dimensions, and for building training data for response surfaces and surrogate models. Because it covers the space uniformly with few trials, it is widely used in practice as the sampling plan for expensive FEM and CFD analyses.

How large should N be?

A common rule of thumb is N >= 10*d where d is the number of dimensions. For the 2D case in this tool, N in the range 20 to 50 already shows clear uniformity. The advantage of LHS is visible even for small N as long as the integrand is smooth; both methods improve with larger N, but LHS tends to drop faster. For highly oscillatory functions the advantage shrinks.

Latin Hypercube Sampling (LHS) Simulator — Free Online Calculator

Parameters

Sample size N

pts

Dimensions d

2D fixed

Fixed to 2D for visualization. LHS itself extends to arbitrary dimension d.

Random seed (LCG)

Integrand f(x,y)

0: Gaussian peak (centered) / 1: sin(πx)sin(πy) ridge / 2: high frequency sin(20πx)sin(20πy)

Random numbers are generated deterministically with an LCG. The same seed and N always produces the same samples.

Results

—

True value ∫∫f dxdy

—

LHS estimate

—

Plain MC estimate

—

|MC error| / |LHS error|

Sample Scatter (left: LHS / right: plain MC)

Background grid lines mark the LHS N×N cell partition. LHS places one point per row and column (stratified); plain MC has no stratification.

Theory & Key Formulas

For each dimension $d$, Latin Hypercube Sampling builds an independent permutation $\pi_d$ of $\{0,1,\dots,N-1\}$ and sets the $d$-th coordinate of the $i$-th sample by (with $u_{i,d}\sim U(0,1)$):

$$x_{i,d} = \frac{\pi_d(i) + u_{i,d}}{N}$$

The integral of $f$ is estimated by the sample mean:

$$\hat{I} = \frac{1}{N}\sum_{i=1}^{N} f(\mathbf{x}_i)$$

For smooth integrands, the variance of the LHS estimate drops faster than $1/N$ of plain MC, so the practical error is smaller:

$$\mathrm{Var}(\hat{I}_\text{LHS}) \le \mathrm{Var}(\hat{I}_\text{MC})$$

The test function $f_1(x,y)=\sin(\pi x)\sin(\pi y)$ has the exact integral $4/\pi^2 \approx 0.4053$. This tool uses that value as the reference to compare LHS and MC errors.

What is Latin Hypercube Sampling?

🙋

I keep seeing "LHS" mentioned around CAE. How is it different from regular Monte Carlo? We are still just throwing random points, right?

🎓

Great question. Both throw random points, but the placement rule is different. LHS partitions each dimension into N equal cells and adds the constraint "exactly one point per row and per column." Look at the left side of the simulator above — you can see one blue point in every column horizontally and every row vertically. Plain MC on the right (orange) has no such constraint, so by chance empty strips or clusters appear.

🙋

Got it! So "one per row and column" is basically a Sudoku rule. Why does that make the integration more accurate?

🎓

Nice analogy. Because the samples cover the space uniformly, the bias in the function average is smaller. Imagine a function with a big value in the upper right; if plain MC happens not to put any sample there, the estimate is underestimated. LHS guarantees at least one sample in every row and column, so such "gaps" can almost never occur. Try N=20 in the simulator and jiggle the seed a few times: the LHS estimate (blue) stays close to the true 0.4053, while MC (orange) swings widely between 0.3 and 0.5.

🙋

The error-ratio card shows "2.5×" or "5×" — does that mean LHS has a smaller error than plain MC?

🎓

Exactly. It is |MC error|/|LHS error|, so any value above 1 means LHS wins. The practical point is that this is at the same sample size. If you can afford 100 CFD runs, simply choosing LHS placement instead of plain MC gives a clearly better average or response surface — at no extra cost, just by changing the sampling strategy.

🙋

When I switch the integrand to "high frequency" (mode 2), the error ratio collapses to around 1 and the difference almost disappears. Why?

🎓

Sharp observation. LHS helps only when the integrand is smooth. A function like $\sin(20\pi x)\sin(20\pi y)$ oscillates several times inside a single cell, so the stratification benefit washes out. In CAE language, LHS is useless against "uncorrelated high-frequency noise." In practice you either trust that the response is smooth, or you fit a smooth surrogate first and then apply LHS on top of that.

Frequently Asked Questions

Both are variance-reduction techniques that cover the space more uniformly than plain Monte Carlo, but their philosophies differ. LHS is a stratified scheme that guarantees a uniform marginal in every dimension and is generated in one batch for a fixed N. Quasi-random sequences such as Sobol can be extended incrementally but offer less control over the correlation between dimensions. CAE design exploration with a fixed budget tends to favor LHS, while numerical integration of smooth functions favors Sobol.

The marginal-uniformity property holds in any dimension, but the space grows exponentially with d (the curse of dimensionality), so projections onto a 2D subspace can still look full of holes for small N. To improve high-dimensional LHS, practitioners use Maximin LHS that maximizes the minimum pairwise distance, or correlation-minimized (optimized) LHS designs.

Standard LHS fixes N up front and generates the whole batch at once; adding a single point afterwards breaks the "one point per row and column" property. When extension is needed, Nested LHS (growing N to k*N) or Refinable LHS constructions are used. They are more complex to implement, so if possible it is simpler to start with a sufficiently large N.

In expectation, LHS variance is guaranteed to be no larger than plain MC when the integrand is monotone or smooth in each dimension (McKay 1979). For highly oscillatory functions, or when N is extremely small, the gap shrinks. The "high frequency" mode of this simulator (mode 2) is exactly such a case where the error ratio hovers around 1.

Real-World Applications

Training data for response surfaces and surrogate models: When one FEM/CFD case takes hours, running hundreds of cases over the design space is not affordable. Placing 30 to 100 LHS cases uniformly and fitting polynomial regression, kriging or neural network surrogates to the results is a standard CAE workflow to accelerate design exploration and optimization.

Uncertainty quantification (UQ) and reliability analysis: LHS is used in probabilistic analysis that propagates the variability of material constants, plate thickness or loads into product performance. Combined with Monte Carlo filtering or Saltelli's sensitivity methods, it ranks the importance of each input variable with a limited number of runs.

Design of experiments (DOE): Beyond computer experiments, LHS is used in physical experiments in chemical processes or production lines when continuous variables and many factors need to be covered uniformly with a limited number of runs. Unlike classical orthogonal-array DOE, it handles continuous variables naturally.

Hyperparameter search in machine learning: LHS is used as an improved random search to explore hyperparameter spaces such as learning rate, regularization coefficient and hidden-layer size. It covers a wider space with fewer trials than grid search, and is also a popular initial sampling for Bayesian optimization.

Common Misconceptions and Cautions

The most common pitfall is to overestimate LHS as "you can use a smaller sample size without losing accuracy". What LHS reduces is the variance of the estimator, not the local resolution of the function itself. For a sharply peaked response, a 20-sample LHS still cannot pin down the peak location precisely. In the simulator, combining the Gaussian-peak integrand (mode 0) with N=20 still shows a noticeable error in the LHS estimate. Remember: "the samples become uniform" is a different statement from "the peak is detected."

The next pitfall is to confuse "uniform marginals in each dimension" with "the joint sample is uniform". LHS only guarantees the marginals; the 2D projection can still be biased. Swapping the seed a few times in the simulator, you will sometimes see points lining up along a diagonal — a residual correlation. To avoid this, advanced LHS designs add a maximin-distance or minimum-correlation criterion. This tool implements standard LHS, so depending on the seed you can also observe such biases.

Finally, the advantage of LHS depends strongly on the integrand. Mode 1 ($\sin(\pi x)\sin(\pi y)$) is exactly the smooth low-frequency case that LHS handles best; the error ratio can reach several times to more than ten. The high-frequency mode 2 erases the benefit and even lets MC win occasionally by luck. In practice it is safer to probe the integrand with a test function first, or evaluate the variance across multiple seeds. LHS is not a universal cure but an "accelerator for smooth functions."

Latin Hypercube Sampling — LHS vs Plain Monte Carlo

What is Latin Hypercube Sampling?

Frequently Asked Questions

Real-World Applications

Common Misconceptions and Cautions

Related Tools