Kriging Surrogate Model (Gaussian Process Regression) Simulator

Replace expensive CFD/FEM evaluations with a Kriging (Gaussian Process Regression) surrogate built from just a few training points. Switch between SE / Matérn kernels and tune length scale ℓ and nugget σ_n² to watch the predictive mean, 95% credible interval, log marginal likelihood and RMSE respond in real time.

Parameters

Kernel function

Prior assumption about smoothness of the true response

Training samples N

Number of expensive evaluations (CFD/FEM/experiment)

Length scale ℓ

Kernel width. Larger ℓ assumes a smoother function

Signal variance σ_f²

Amplitude scale of the function

Nugget σ_n²

Observation-noise variance. 1e-6 to 1e-3 for deterministic CAE

True-function amplitude A

Amplitude of the test function f(x) = A·sin(2πx)

Results

—

Training samples N

—

Average sample spacing

—

Predictive std σ_pred

—

95% credible interval width

—

RMSE prediction error

—

Log marginal likelihood

—

Kriging interpolation — true / training / mean ± 95% CI

White: true function A·sin(2πx). Red dots: training observations (with noise jitter). Blue line: Kriging predictive mean. Blue band: 95% credible interval.

Kriging prediction + credible interval (Chart.js)

Log marginal likelihood vs length scale ℓ

Theory & Key Formulas

$$\hat\mu(x_*) = k_*^T K^{-1} y,\qquad \hat\sigma^2(x_*) = k(x_*,x_*) - k_*^T K^{-1} k_*$$

K is the N×N kernel matrix (K_{ij}=k(x_i,x_j)+σ_n²δ_{ij}); k_* is the vector of covariances between the prediction point and the training points. The GP returns posterior mean μ̂ and variance σ̂² together.

$$k_{SE}(r) = \sigma_f^2 \exp\!\left(-\frac{r^2}{2\ell^2}\right)$$

Squared exponential (RBF) kernel. r=|x−x'|, ℓ = length scale, σ_f² = signal variance. Assumes an infinitely differentiable, very smooth family of functions.

$$k_{M5/2}(r) = \sigma_f^2\left(1+\frac{\sqrt{5}\,r}{\ell}+\frac{5r^2}{3\ell^2}\right)\exp\!\left(-\frac{\sqrt{5}\,r}{\ell}\right)$$

Matérn 5/2 kernel — the Bayesian-Optimization default. Twice differentiable, captures sharper variations than SE.

$$\log p(y\mid X,\theta) = -\tfrac12 y^T K^{-1} y - \tfrac12 \log|K| - \tfrac{N}{2}\log(2\pi)$$

Log marginal likelihood. Term 1 = data fit, term 2 = model-complexity penalty. Maximise over θ=(ℓ,σ_f²,σ_n²) to estimate the hyperparameters.

Kriging Surrogate Model (Gaussian Process Regression)

🙋

A "surrogate model" replaces the real CFD run with something cheap, right? But if it's just regression, why not use a polynomial? What makes Kriging special?

🎓

Good question. The decisive feature of Kriging is that it returns not only a predicted value but also the predictive uncertainty (variance) at every point. Polynomials or RBF interpolants give a single "best guess"; Kriging gives a mean μ̂(x_*) and a standard deviation σ̂(x_*) together. So you can tell quantitatively "here the prediction is confident because training points are nearby" or "here it's a guess because the design space is empty". That uncertainty estimate is exactly what the Expected Improvement acquisition function in Bayesian Optimization is built on.

🙋

OK, so how do I pick "length scale ℓ" and "nugget σ_n²" on the left? When I move them around, the RMSE swings wildly and I can't tell what is right.

🎓

That's the heart of Kriging — and the answer is "pick the ℓ that maximises the log marginal likelihood (logML)." The bottom-right chart in this tool plots logML as a function of ℓ; just read off the peak. In real workflows BFGS or L-BFGS is used to optimise numerically, and because local optima are common with few observations, a multi-start from several initial values (ℓ=0.05, 0.5, 5) is standard. Libraries such as GPyOpt, scikit-learn GaussianProcessRegressor, SMT and Dakota all do this optimisation internally.

🙋

Switching the kernel between SE and Matérn changes the prediction curve quite a lot. SE looks cleanest — when would I actually want Matérn?

🎓

SE assumes infinite differentiability, so it shines on genuinely smooth phenomena — fitting an airfoil's lift coefficient versus angle of attack, for instance, SE is plenty. But for physics with sharp transitions — buckling, contact, phase change — SE tries to stay too smooth and ends up oscillating between observations. Matérn 3/2 (once-differentiable) and Matérn 5/2 (twice-differentiable) are more honest there. In Bayesian Optimization Matérn 5/2 is the de-facto default, so if in doubt, start with 5/2.

🙋

With the defaults at N=8 the RMSE is 1.07 and the verdict is NG — does it just need more samples?

🎓

Yes, push N up to about 20 and the RMSE for sin(2πx) drops below 0.2. A rough rule of thumb is 10-20 points in 1D, 30-50 in 2D, 100-200 in 5D — required samples scale roughly as 10^d, which is the famous curse of dimensionality. When the dimension exceeds 10 in practice you start replacing plain Kriging with Polynomial Chaos Expansion or Deep Kernel Learning.

🙋

One more — for deterministic outputs like CFD with no measurement noise, should I still set a nugget σ_n²?

🎓

Yes, for numerical reasons. In theory σ_n²=0 is fine, but then K becomes nearly singular and K⁻¹ explodes. So you always add a small "jitter" (1e-6 to 1e-3) that plays the same role as a nugget — a regularisation term for numerical stability. For experimental data you put in the actual measurement variance. Multi-fidelity Kriging mixing CFD and experiments uses different nuggets per dataset by design.

Frequently Asked Questions

They are essentially the same method. Kriging is the geostatistics term introduced in the 1950s by Danie Krige, a South African mining engineer, to interpolate ore-grade values between boreholes; in the machine-learning community the identical formulas are called Gaussian Process Regression (GPR). Both are linear Bayesian estimators that build a predictor as a weighted sum over observations and also return a predictive variance, with smoothness controlled by a kernel k(x,x'). The only convention-level difference is that Kriging is usually written with the variogram γ(h) while GPR uses the covariance k(r). CAE and optimisation literature tends to say Kriging; ML and UQ literature tends to say GPR.

The squared-exponential (SE / RBF) kernel assumes an infinitely differentiable, very smooth function — a good first choice for aerodynamic or thermal responses. Matérn 3/2 represents functions that are only once-differentiable and copes better with sharp features such as buckling or contact responses. Matérn 5/2 sits in the middle and is the de-facto default in Bayesian Optimization. If unsure, start with Matérn 5/2 and compare against SE through residual plots or cross-validation.

Hyperparameters are typically estimated by maximising the log marginal likelihood (logML). The second chart in this tool plots logML as a function of ℓ; its peak is the maximum-likelihood length scale. In practice BFGS or L-BFGS is used for the numerical search, but with few observations local optima are common, so multi-start from several initial values (for example ℓ=0.05, 0.5, 5) is standard. The nugget σ_n² is the observation-noise variance — set a tiny value (1e-6 to 1e-3) for deterministic CAE outputs and the measurement variance for experimental data.

Standard GPR costs O(N³) time and O(N²) memory, so training sets larger than about 2,000 points become painful on a workstation. Input dimension can be pushed to about twenty by learning per-axis length scales (Automatic Relevance Determination, ARD), but beyond that the curse of dimensionality bites. Higher-dimensional or larger-scale problems move to Sparse GP (Inducing Points), SVGP, Deep Kernel Learning, Polynomial Chaos Expansion or neural networks. For typical CAE optimisation (5-10 dimensions, 100-500 evaluations) plain Kriging is hard to beat.

Real-World Applications

Bayesian Optimization: For design optimisation where one evaluation takes hours of CFD or FEM, a Kriging surrogate combined with the Expected Improvement acquisition function recommends "where to evaluate next for the biggest expected improvement". It is the workhorse behind aircraft-wing shape optimisation, rocket-nozzle design and chemical-plant operating-point optimisation, typically reaching the global optimum within tens to hundreds of expensive evaluations. Google Vizier, Facebook Ax and scikit-optimize provide widely used open-source implementations.

CAE surrogates for robust design: Automotive crash analysis, electromagnetic-field simulation and fluid-thermal coupling — each case taking minutes to days — are replaced by Kriging surrogates, then a Monte Carlo sweep of dimensional, material and load uncertainties (Uncertainty Quantification, UQ) runs through ten-thousand samples overnight. Running CFD ten-thousand times directly is impossible; surrogates at 0.1 s per sample make it routine. Dakota, SMT and UQLab are the standard toolboxes.

Geostatistics and resource exploration: The original home of Kriging. From dozens of borehole assays it builds a 3D map of ore grade in a deposit — used across oil and gas exploration, rare-earth reserve estimation and groundwater-contamination forecasting. Variogram analysis to identify spatial correlation structure is performed together as the industry-standard recipe.

Uncertainty quantification in machine learning: In autonomous-driving obstacle detection and medical-image diagnosis — where you want a warning when the model may be wrong — GPR serves as an auxiliary uncertainty estimator. Deep Kernel Learning replaces the final layer of a neural network with a GP and provides predictive variance even for high-dimensional inputs like images. NASA spacecraft navigation, Toyota driver assistance and Genentech drug-screening pipelines have adopted similar ideas.

Common Misconceptions and Pitfalls

The biggest pitfall is trusting the Kriging credible interval absolutely. The posterior variance σ̂² is "the variance given that the assumed kernel and the data are correct"; if the kernel does not match the true function family, the intervals are systematically biased. Fit a truly step-like function with an SE kernel and the model misses the discontinuity badly while still reporting "high confidence". In practice always run a calibration test by cross-validation: check whether observed values fall inside the 95% credible interval at roughly the nominal rate (95%).

Next, freezing the hyperparameters once and never updating them. The ℓ estimated during early exploration is not guaranteed to be appropriate in other regions of the design space. Re-optimise the hyperparameters every time a new evaluation is added. If on the other hand they jump around wildly every iteration, that is a sign of an underestimated noise level or a poor initial sampling plan (regular grid instead of Latin Hypercube or Sobol sequence). Latin Hypercube Sampling combined with a D-optimal design is the standard initial-point recipe.

Finally, assuming Kriging extrapolates well. The opposite is true: outside the bounding box of training data, the predictive mean reverts rapidly to the prior mean (usually zero) and the variance blows up to σ_f². Honest behaviour ("we don't know out here") but if an optimiser keeps proposing points beyond the boundary the result is wasted evaluations. Bayesian-Optimization workflows enforce design-space bounds hard, and sometimes damp Expected Improvement as you approach the boundary. PC-Kriging — combining a global polynomial basis (Polynomial Chaos) with a Kriging residual — is another strong remedy.

Kriging Surrogate Model (Gaussian Process Regression) Simulator