What is the difference between Polynomial Chaos Expansion (PCE) and the Monte Carlo method?

PCE is a method that expands input uncertainties using an orthogonal polynomial basis to construct a response surface. For problems with 5 to 10 or fewer input dimensions, it can achieve accuracy comparable to 10,000 Monte Carlo simulations with only about 100 FEM executions.

How are the coefficients in PCE determined?

There are two primary methods: (1) The Galerkin projection method, which determines coefficients analytically from the definition of an inner product, and (2) The collocation (or regression) method, which runs simulations at specific sample points and estimates the coefficients using least squares regression.

What is the curse of dimensionality in PCE?

When the number of input variables is 'd' and the polynomial degree is 'p', the number of expansion terms increases exponentially as (d+p)!/(d!p!). For example, with d=20 and p=3, this results in 1771 terms, leading to a massive number of required samples. This is addressed using techniques like sparse PCE, adaptive methods, and dimensionality reduction.

Can Sobol sensitivity indices be derived from a PCE?

Yes. Sobol indices can be calculated analytically directly from the PCE coefficients. This is a major advantage of PCE, as it provides total-order sensitivity indices without requiring any additional simulations.

Polynomial Chaos Expansion (PCE) for Uncertainty Quantification

Category: V&V・不確かさ定量化 | 更新 2026-04-12

Polynomial Chaos Expansion basis functions and UQ response surface

多項式カオス展開（PCE）：直交多項式基底による応答曲面の構築と不確かさ定量化

Theory and Physics

Overview of PCE

🧑‍🎓

Professor, I heard that Polynomial Chaos Expansion (PCE) is more efficient than the Monte Carlo method, is that true?

🎓

If the input dimension is low—roughly 5 to 10 or fewer—it's overwhelmingly more efficient. For example, consider a car crash analysis where the Young's modulus of the material, plate thickness, and friction coefficient are treated as three uncertain parameters. With the Monte Carlo method, you'd need to run the FEM about 10,000 times to stabilize the statistics. But with PCE, you can achieve equal or better accuracy with only 50 to 100 FEM runs.

🧑‍🎓

What? You mean it can be done with 1/100th the computational cost? How does that work?

🎓

The key idea is "approximating the response surface with polynomials." We represent the uncertainty of the input parameters as random variables $\boldsymbol{\xi}$ and expand the output quantity $Y$ as a linear combination of orthogonal polynomials like Hermite or Legendre. Once this approximation (surrogate model) is built, we can compute the mean, variance, probability density function, and sensitivity indices analytically from it. No additional simulations are needed.

🧑‍🎓

I see! So while Monte Carlo is like "rolling dice endlessly," PCE is about "creating a smart function approximation"!

🎓

Exactly. However, there are caveats. When the number of input parameters exceeds 20, the number of polynomial terms explodes and efficiency drops. This is called the "curse of dimensionality." We'll discuss this in detail later, but to overcome this weakness, sparse PCE and adaptive methods are being researched.

Selection of Orthogonal Polynomial Basis

🧑‍🎓

You mentioned several polynomial names like Hermite and Legendre. How do you choose which one to use?

🎓

This is actually a crucial point related to the foundation of PCE. The optimal orthogonal polynomial family is determined according to the probability distribution of the input variable—this is called the "Wiener-Askey correspondence."

Input Probability Distribution	Optimal Orthogonal Polynomial	Domain	Typical Application
Normal (Gaussian) Distribution	Hermite Polynomial	$(-\infty, +\infty)$	Variation in material properties
Uniform Distribution	Legendre Polynomial	$[-1, 1]$	Design parameter range specification
Beta Distribution	Jacobi Polynomial	$[-1, 1]$	Constrained parameters
Gamma Distribution	Laguerre Polynomial	$[0, +\infty)$	Positive-valued load distributions
Poisson Distribution	Charlier Polynomial	$\{0, 1, 2, \ldots\}$	Discrete event counts

🧑‍🎓

So there's a "compatible polynomial" determined for each distribution. What happens if you use the wrong combination?

🎓

Convergence slows down, or in the worst case, it diverges. For example, using Hermite polynomials for a uniform distribution input might require 10th order or more for a problem that would normally be sufficient with 3rd order. In practice, we combine appropriate polynomials for each parameter via tensor products, like "Hermite for Young's modulus because it's normally distributed" and "Legendre for plate thickness tolerance because it's uniformly distributed."

Governing Equations and Expansion Formula

🧑‍🎓

Can you show me the basic PCE formula?

🎓

The PCE expansion for an output $Y$ with respect to $d$ random variables $\boldsymbol{\xi} = (\xi_1, \ldots, \xi_d)$ is as follows:

$$ Y(\boldsymbol{\xi}) = \sum_{\boldsymbol{\alpha} \in \mathcal{A}} c_{\boldsymbol{\alpha}} \, \Psi_{\boldsymbol{\alpha}}(\boldsymbol{\xi}) $$

🎓

Here, the meaning of each symbol is:

$c_{\boldsymbol{\alpha}}$ — PCE coefficients (the unknowns we want to find)
$\Psi_{\boldsymbol{\alpha}}(\boldsymbol{\xi})$ — Multivariate orthogonal polynomial basis
$\boldsymbol{\alpha} = (\alpha_1, \ldots, \alpha_d)$ — Multi-index (polynomial degree for each variable)
$\mathcal{A}$ — Index set after truncation

🧑‍🎓

How do you actually construct multivariate orthogonal polynomials?

🎓

They are constructed as the tensor product of univariate orthogonal polynomials for each variable:

$$ \Psi_{\boldsymbol{\alpha}}(\boldsymbol{\xi}) = \prod_{i=1}^{d} \psi_{\alpha_i}^{(i)}(\xi_i) $$

🎓

For example, if $d=2$, $\xi_1$ is normally distributed (→Hermite $He_k$), and $\xi_2$ is uniformly distributed (→Legendre $P_l$), then $\Psi_{(2,1)}(\boldsymbol{\xi}) = He_2(\xi_1) \cdot P_1(\xi_2)$. Orthogonality means the inner product is zero:

$$ \langle \Psi_{\boldsymbol{\alpha}}, \Psi_{\boldsymbol{\beta}} \rangle = \mathbb{E}[\Psi_{\boldsymbol{\alpha}}(\boldsymbol{\xi}) \, \Psi_{\boldsymbol{\beta}}(\boldsymbol{\xi})] = \|\Psi_{\boldsymbol{\alpha}}\|^2 \, \delta_{\boldsymbol{\alpha}\boldsymbol{\beta}} $$

🧑‍🎓

Because of the orthogonality, you can determine each coefficient independently, just like Fourier series!

🎓

Exactly right. Thinking of the analogy with Fourier series makes it easier to understand. Fourier series expands periodic functions with trigonometric functions (orthogonal basis). PCE expands stochastic responses with orthogonal polynomials. The principle is exactly the same.

Deriving Statistics from Coefficients

🧑‍🎓

Once you have the PCE coefficients, how do you calculate the mean and variance?

🎓

This is the most beautiful part of PCE. Thanks to orthogonality, statistics can be obtained directly from the coefficients:

Mean (the zeroth-order coefficient is the mean itself):

$$ \mu_Y = \mathbb{E}[Y] = c_{\mathbf{0}} $$

Variance (sum of squares of coefficients of order 1 and above):

$$ \sigma_Y^2 = \text{Var}(Y) = \sum_{\boldsymbol{\alpha} \in \mathcal{A}, \, |\boldsymbol{\alpha}|>0} c_{\boldsymbol{\alpha}}^2 \, \|\Psi_{\boldsymbol{\alpha}}\|^2 $$

🧑‍🎓

What? You get the variance just by summing the squares of the coefficients? You don't need to sample tens of thousands of times like Monte Carlo!

🎓

Furthermore, skewness, kurtosis, and even the probability density function can be obtained analytically. For example, to get the probability density function, you apply Monte Carlo sampling to the PCE approximation $Y(\boldsymbol{\xi})$, but this is polynomial evaluation, not FEM, so it finishes in an instant. Even 100,000 samples are on the order of milliseconds.

Derivation of Sobol Sensitivity Indices

🧑‍🎓

I heard you can also do sensitivity analysis from PCE?

🎓

This is arguably PCE's killer feature. Sobol sensitivity indices can be calculated directly from the PCE coefficients. The first-order Sobol index for the $i$-th variable is:

$$ S_i = \frac{\displaystyle\sum_{\boldsymbol{\alpha} \in \mathcal{A}_i} c_{\boldsymbol{\alpha}}^2 \, \|\Psi_{\boldsymbol{\alpha}}\|^2}{\sigma_Y^2} $$

🎓

Here, $\mathcal{A}_i$ is the set of "terms that depend only on $\xi_i$" (i.e., $\alpha_j = 0 \; \forall j \neq i$ and $\alpha_i > 0$). The total Sobol index $S_i^T$ is similarly obtained from the coefficients of "all terms involving $\xi_i$."

🧑‍🎓

Can you give me a concrete example? For a 3-parameter problem, for instance?

🎓

Good question. For example, in predicting residual stress in a weld, let's say the input parameters are heat input $Q$, welding speed $v$, and preheat temperature $T_0$. After performing PCE expansion and obtaining the coefficients, if the results are $S_Q = 0.65$, $S_v = 0.22$, $S_{T_0} = 0.08$, then you know that 65% of the variation in residual stress is due to uncertainty in heat input. This directly leads to design decisions like "heat input management should be the top priority."

Physical Meaning of Each Term

PCE coefficient $c_{\boldsymbol{\alpha}}$: Represents the contribution of each polynomial basis to the output variation. Terms with large coefficients dominate the output uncertainty.
Orthogonal polynomial basis $\Psi_{\boldsymbol{\alpha}}$: Corresponds to "basis vectors" in the input probability space. Orthogonality allows quantifying the contribution of each basis independently.
Truncation order $p$: A parameter controlling the accuracy of the expansion. Higher order increases accuracy but also increases the number of terms and computational cost. In practice, $p=2\sim5$ is common.

Calculation of Number of Expansion Terms

When there are $d$ input variables and a total order truncation $p$, the number of expansion terms $P$ is:

$$ P = \binom{d+p}{p} = \frac{(d+p)!}{d! \, p!} $$

Input Dimension $d$	Order $p=2$	Order $p=3$	Order $p=4$
3	10	20	35
5	21	56	126
10	66	286	1,001
20	231	1,771	10,626

Numerical Methods and Implementation

Galerkin Projection Method (Intrusive PCE)

🧑‍🎓

What methods are there to find the PCE coefficients?

🎓

There are two main methods. First, let's explain the Galerkin projection method (intrusive PCE). Using orthogonality, the PCE coefficients $c_{\boldsymbol{\alpha}}$ are found via inner product:

$$ c_{\boldsymbol{\alpha}} = \frac{\langle Y, \Psi_{\boldsymbol{\alpha}} \rangle}{\|\Psi_{\boldsymbol{\alpha}}\|^2} = \frac{\mathbb{E}[Y(\boldsymbol{\xi}) \, \Psi_{\boldsymbol{\alpha}}(\boldsymbol{\xi})]}{\mathbb{E}[\Psi_{\boldsymbol{\alpha}}^2(\boldsymbol{\xi})]} $$

🧑‍🎓

It's elegant in theory, but how do you actually compute it? You need FEM results to compute the expectation, right?

🎓

Sharp observation. In the Galerkin projection method, you need to substitute the PCE expansion into the governing equations and modify the solver itself. For example, for the structural analysis stiffness equation $[K(\boldsymbol{\xi})]\{u(\boldsymbol{\xi})\} = \{F(\boldsymbol{\xi})\}$, you expand $K$, $u$, and $F$ with PCE and derive a system of equations via projection onto each basis. Theoretically, this is the most accurate, but it requires rewriting the internals of existing FEM solvers, making it difficult to use in practice. This is why it's called "intrusive."

Collocation/Regression Method (Non-Intrusive PCE)

🧑‍🎓

Isn't there a method that doesn't require modifying the solver?

🎓

Yes, there is. The overwhelmingly used method in practice is Non-Intrusive PCE. It treats existing FEM solvers as black boxes. The procedure is:

Generate $N$ sample points $\{\boldsymbol{\xi}^{(1)}, \ldots, \boldsymbol{\xi}^{(N)}\}$
Run FEM at each sample point and obtain outputs $\{Y^{(1)}, \ldots, Y^{(N)}\}$
Estimate coefficients via least squares

$$ \hat{\mathbf{c}} = \arg\min_{\mathbf{c}} \sum_{k=1}^{N} \left( Y^{(k)} - \sum_{\boldsymbol{\alpha}} c_{\boldsymbol{\alpha}} \Psi_{\boldsymbol{\alpha}}(\boldsymbol{\xi}^{(k)}) \right)^2 $$

🎓

In matrix form, it's $\hat{\mathbf{c}} = (\boldsymbol{\Psi}^T \boldsymbol{\Psi})^{-1} \boldsymbol{\Psi}^T \mathbf{Y}$. Here, $\boldsymbol{\Psi}$ is the $N \times P$ design matrix ($\Psi_{k,\boldsymbol{\alpha}} = \Psi_{\boldsymbol{\alpha}}(\boldsymbol{\xi}^{(k)})$). This works with any solver output, whether it's Abaqus, Ansys, or OpenFOAM.

🧑‍🎓

What's the minimum $N$ required?

🎓

The minimum condition is $N \geq P$ (samples greater than or equal to the number of terms), but in practice, an oversampling ratio of 2 to 3 times is recommended. That is, $N = 2P \sim 3P$. For example, if $d=5$, $p=3$, then $P=56$ terms, so $N = 112 \sim 168$ FEM runs are needed.

Selection of Numerical Quadrature Rules

🧑‍🎓

Can you choose sample points randomly? Or are there specific rules?

🎓

Good question. The choice of sample points (design of experiments) directly affects PCE accuracy. Let's compare the main methods:

Sampling Method	Characteristics	Required Sample Count	Recommended Scenario
Tensor Product Quadrature	Place Gauss points independently on each axis	$(p+1)^d$ (exponential increase)	Low dimension $d \leq 3$
Smolyak Sparse Grid	"Thinned-out" version of tensor product	Significantly reduced	$d = 4 \sim 15$
Latin Hypercube	Space-filling quasi-random sampling	Freely settable	Regression-based PCE
Sobol Sequence	Quasi-random numbers via low-discrepancy sequence	Freely settable	High-dimensional regression

🧑‍🎓

With tensor product, for 5 dimensions $(p+1)^5$ ... for order 4, that's $5^5 = 3125$ FEM runs needed. That's not realistic!

🎓