Latin Hypercube Sampling (LHS)
Theory and Physics
Basic Principles of LHS
Professor, what makes Latin Hypercube Sampling better than ordinary random sampling? Just hearing the name makes it sound incredibly difficult...
The name sounds fancy, but what it does is simple. It evenly divides the range of each input variable into $N$ intervals and selects one point from each interval. With random sampling, points can coincidentally cluster in certain regions—a phenomenon called clumping—but LHS structurally prevents this.
Huh, does just dividing evenly have that much effect?
It makes a big difference. Compared to the Monte Carlo method with the same sample size, variance is often reduced by 30-50%. This means you can obtain accurate statistical information with fewer simulation runs. In CAE, a single analysis can sometimes take hours, so being able to reduce the sample count is a huge computational cost advantage.
I see! For example, in what situations is it used?
There are three typical use cases.
- Uncertainty Quantification (UQ): How variability in material properties affects stress distribution
- Sensitivity Analysis: Identifying which input parameter has the greatest effect on the output
- Initial Point Placement for DOE (Design of Experiments): Determining training data points when building surrogate models (e.g., Kriging)
For example, in automotive crash analysis, when there is variability in three variables—sheet thickness, yield stress, and friction coefficient—placing 50 sample points using LHS can estimate the distribution of the Head Injury Criterion (HIC) with accuracy comparable to 100 Monte Carlo points.
Mathematics of Stratified Sampling
How would you write "dividing evenly" in mathematical terms?
Consider generating $N$ samples for $d$-dimensional input variables $\mathbf{X} = (X_1, X_2, \ldots, X_d)$. Let $F_j$ be the cumulative distribution function (CDF) of each variable $X_j$. The core of LHS is stratifying the marginal distribution of each variable into $N$ equiprobable strata.
The $k$-th stratum ($k = 1, 2, \ldots, N$) of the $j$-th variable is defined by the interval:
Since we sample one point from each stratum, the probability that any sample point falls into the $k$-th stratum is:
Specifically, the position within the stratum is determined by a uniform random number $U_{j,k} \sim \text{Uniform}(0,1)$, and the $k$-th sample value is generated by:
Here, $\pi_j$ is a random permutation of $\{1, 2, \ldots, N\}$, generated independently for each variable. This permutation randomizes the combination of strata across variables.
So it's like creating "even shelves" and then randomly picking a point within each shelf?
That image is perfect. It's easier to understand in two dimensions. Imagine an $N \times N$ chessboard and placing exactly one point in each row and each column—this is the Latin square condition. The name "Latin square" comes from the "Latin Square" studied by mathematician Leonhard Euler, which is an arrangement where each symbol appears exactly once in each row and column.
Proof of Variance Reduction Effect
Can you explain mathematically why variance decreases with LHS?
Let's compare the variance of the expected value estimator $\bar{Y}_N = \frac{1}{N}\sum_{i=1}^{N} g(\mathbf{x}^{(i)})$ for the output $Y = g(\mathbf{X})$. According to the classical result by McKay, Beckman & Conover (1979):
The first term $\sigma^2 / N$ is the variance of simple Monte Carlo itself. The second term is the variance reduction amount due to LHS, and this term becomes larger when $g$ is monotonic with respect to each input variable (i.e., has a large additive component). In other words:
Equality holds only when $g$ is completely non-additive (purely interactive) with respect to all input variables. In actual CAE problems, an additive component almost always exists, so LHS is always more advantageous than Monte Carlo.
I see, so the more "well-behaved" the input-output relationship is, the greater the benefit of LHS.
Exactly. Stress responses in structural analysis are generally monotonic with respect to variables, so the variance reduction effect is extremely effective. On the other hand, for chaotic systems (like instantaneous values in turbulence), the benefit may be smaller. However, theoretically, LHS never performs worse than Monte Carlo, so using LHS by default is a safe bet.
Space-Filling Criteria
Can it happen that even with LHS sampling, the points feel somewhat clustered?
Yes, it can. Basic LHS guarantees stratification of marginal distributions but does not guarantee uniformity in the multidimensional space. For example, with two variables, it's possible to have points concentrated in the upper right and lower left, leaving the middle sparse. That's why we use space-filling criteria to optimize the placement.
Representative space-filling criteria are the following three:
1. Maximin Distance Criterion
Maximizes the minimum distance among all sample point pairs:
Here, $\| \cdot \|_p$ is the $L_p$ norm (usually Euclidean distance with $p=2$). Intuitively, it seeks a configuration that "spreads points as far apart as possible."
2. $\phi_p$ Criterion based on Maximin Distance
A continuous relaxation by Morris & Mitchell (1995):
As $p$ increases, it asymptotically approaches the maximin criterion. $p = 15 \sim 50$ is practical. The advantage is that it is differentiable, allowing gradient-based optimization.
3. Discrepancy Criterion
Measures the deviation of the point set from a uniform distribution:
The smaller this star discrepancy $D_N^{*}$ is, the closer the configuration is to a uniform distribution. It is directly linked to the error bound of numerical integration via the Koksma-Hlawka inequality.
In practice, which should I use, maximin distance or discrepancy?
For building surrogate models (Kriging, RBF, etc.), maximin distance is the standard because prediction accuracy depends on the distance between the prediction point and its nearest training point. On the other hand, if the goal is numerical integration (estimation of expected value), discrepancy is theoretically more correct. If in doubt, choosing maximin distance won't lead you far astray.
Correlation Control
Can unintended correlations arise between variables generated by LHS?
Sharp question. Since random permutations are applied independently to each variable, spurious correlations can arise by chance between variables. When $N$ is small, fairly large correlations can appear. The method to prevent this is the Iman & Conover (1982) correlation control method.
The core of the algorithm is as follows:
- Set the target correlation matrix $\mathbf{C}_{\text{target}}$ (if independent, use the identity matrix $\mathbf{I}$)
- For the rank matrix $\mathbf{R}$ from LHS, calculate the current rank correlation matrix $\mathbf{C}_R$
- Perform Cholesky decomposition $\mathbf{C}_{\text{target}} = \mathbf{L}\mathbf{L}^T$
- Rearrange the ranks using the correction matrix $\mathbf{Q} = \mathbf{L} \cdot \mathbf{C}_R^{-1/2}$
The rank correlation matrix after correction approximately matches the target correlation matrix:
So it can also be used when input variables are correlated (e.g., Young's modulus and Poisson's ratio are correlated)!
Exactly. Being able to sample while reflecting correlations in material properties is extremely important in practice. It's often used in combination with Nataf transformation or Copulas.
Algorithms and Implementation
Basic LHS Generation Algorithm
What specific steps should I follow to actually generate LHS in a program?
It's simple. For $N$ samples, $d$ variables:
- For each variable $j = 1, \ldots, d$, generate a random permutation $\pi_j$ of $\{1, 2, \ldots, N\}$
- For each sample $i$ and each variable $j$, generate a uniform random number $u_{ij} \sim U(0,1)$
- Calculate coordinates on the unit hypercube: $s_{ij} = \frac{\pi_j(i) - 1 + u_{ij}}{N}$
- Transform to the desired distribution via inverse CDF: $x_{ij} = F_j^{-1}(s_{ij})$
Step 3 is the core of LHS, where $\pi_j(i) - 1$ determines the stratum and $u_{ij}$ randomly selects the position within the stratum.
It's simpler than I thought! What's the computational complexity?
Generating basic LHS is $O(Nd)$, the same as Monte Carlo. Optimized LHS (maximin distance criterion) adds an optimization loop, so it becomes $O(N^2 d \cdot T)$ (where $T$ is the number of iterations), but the cost of sample generation is negligible compared to the cost of a single CAE analysis run.
Optimized LHS
How do you maximize the maximin distance? Exhaustive search is impossible, right?
Exhaustive search has $(N!)^d$ combinations, so of course it's impossible. Practical methods include:
- ESE Method (Enhanced Stochastic Evolutionary): Randomly swaps two rows to improve. Proposed by Jin et al. (2005), easy to implement.
- Simulated Annealing: Proposed by Morris & Mitchell (1995). Can escape local minima.
- Genetic Algorithm: Effective for large-scale problems but computationally expensive.
- Column Pairwise Exchange: Fast algorithm by Viana et al. (2009). Optimizes two variables at a time.
In practice, the ESE method or simulated annealing are standard. They are also implemented in pyDOE and SciPy.
Python Implementation Example
How do you write it in Python?
Using SciPy's qmc.LatinHypercube is the current standard. Optimized LHS (maximin criterion) can also be generated in one line.
import numpy as np
from scipy.stats.qmc import LatinHypercube
from scipy.stats import norm, uniform
# --- Basic LHS (5 variables, 50 samples) ---
d, N = 5, 50
sampler = LatinHypercube(d=d, seed=42, optimization="random-cd")
unit_samples = sampler.random(n=N) # LHS on [0,1]^d
# --- Transform to target distribution via inverse CDF ---
# Variable 1: Normal distribution N(210e3, 5e3) ← Young's modulus [MPa]
# Variable 2: Normal distribution N(0.3, 0.02) ← Poisson's ratio
# Variable 3: Uniform distribution U(1.5, 2.5) ← Sheet thickness [mm]
x1 = norm.ppf(unit_samples[:, 0], loc=210e3, scale=5e3)
x2 = norm.ppf(unit_samples[:, 1], loc=0.3, scale=0.02)
x3 = uniform.ppf(unit_samples[:, 2], loc=1.5, scale=1.0)
# ... and so on
What is optimization="random-cd"?
It's an optimization option that minimizes the Centered Discrepancy. There's also "Lloyd". The default (None) gives basic LHS without optimization. In practice, always include an optimization option because raw LHS can sometimes have poor space-filling properties.
Convergence and Determining Sample Size
Is the convergence rate of LHS faster than Monte Carlo?
In terms of order, both are $O(1/\sqrt{N})$, but the constant term differs. LHS has a smaller constant, so it requires fewer samples to reach the same error level. Let's summarize specific guidelines in a table.
| Application | Dimension $d$ | Recommended Sample Count $N$ | Notes |
|---|---|---|---|
| Mean/Standard Deviation Estimation | Any | $N \geq 10d$ | Minimum line |
| Probability Distribution Estimation (Histogram) | $\leq 10$ | $N \geq 50d$ | Ensuring tail accuracy |
| Surrogate Model Construction (Kriging) | $\leq 20$ | $N = 10d \sim 50d$ | Optimize with space-filling criteria |
| Sobol Sensitivity Index Estimation | Any | $N \geq 100(d+1)$ | Used in conjunction with Saltelli method |
Practical Guide
LHS Application Flow in CAE Analysis
Please explain the overall flow of using LHS in an actual CAE project.
The typical flow has 5 steps:
- Selection of Input Variables: Decide which parameters to vary (material properties, geometric dimensions, load conditions, etc.). Variables with low sensitivity are screened out (
Related Topics
この記事の評価ご回答ありがとうございます!参考に
なったもっと
詳しく誤りを
報告