What is the relationship between the comparison error E and the validation uncertainty u_val?

If |E| ≤ u_val, the difference between simulation and experiment is within the range of uncertainty, and validation is considered successful. If |E| > u_val, model improvement is required.

What are area metrics?

They are metrics that quantify the area of difference between the simulation and experimental CDFs (Cumulative Distribution Functions). They allow for the collective evaluation of multiple response quantities, expressing the overall distribution agreement in a single numerical value.

In what situations is the u-pooling method used?

It is used when you want to comprehensively evaluate results from multiple validation points (under different conditions or locations). By aggregating the u-values from each point and comparing them to a standard uniform distribution, the overall validity of the model is determined.

ASME V&V 20 Validation Metrics

Q: What are validation metrics?

They are indicators that quantitatively evaluate the agreement between simulation results and experimental data. According to ASME V&V 20, the comparison error E = S − D is calculated and compared against the validation uncertainty u_val to determine the validity of the model.

Category: V&V / ASME V&V | Integrated 2026-04-12

ASME V&V 20 validation metric framework - comparison error E versus validation uncertainty u_val diagram

ASME V&V 20 におけるバリデーションメトリクスの概念図 — 比較誤差Eとバリデーション不確かさu_valの関係

Theory and Physics

What are Validation Metrics?

🧑‍🎓

Professor, what exactly do validation metrics calculate? If we're just checking if the simulation and experiment match, wouldn't it be enough to simply calculate the error percentage?

🎓

Good question. It's true that many people just look at "the percentage difference between the simulation and experimental values." But that alone is insufficient. Why? Because both experimental data and simulations have uncertainties.

For example, let's say the stress measured in an experiment is 100 MPa. The simulation result is 105 MPa. Is a 5% error unacceptable? But what if the experimental measurement accuracy is $\pm8$ MPa? What if the simulation's mesh dependency is $\pm3$ MPa? That 5 MPa difference might be within the range of uncertainty, meaning it could actually be considered "acceptable."

🧑‍🎓

I see... So we need a mechanism not just to look at the magnitude of the error, but to judge whether that error is a "meaningful difference."

🎓

Exactly. ASME V&V 20 (Standard for Verification and Validation in Computational Fluid Dynamics and Heat Transfer) is precisely the standardized framework for that judgment. Validation metrics are indicators that quantitatively evaluate the agreement between simulation results and experimental data, taking into account the uncertainties of both.

Definition of Comparison Error E

🧑‍🎓

Specifically, what formula is used for the calculation?

🎓

The starting point is the Comparison Error E. It's simple:

$$ E = S - D $$

Here, $S$ is the simulation outcome, and $D$ is the experimental Data.

🎓

Since it's a signed difference, if the simulation is larger than the experiment, $E > 0$; if smaller, $E < 0$. If you want a "relative error," you can use $E/D$, but in the basic ASME V&V 20 framework, the absolute value-based $E$ is often used.

🧑‍🎓

I understand up to this point. But based on what you said earlier, looking at this $E$ alone is meaningless, right?

🎓

Yes. With just $E$, you can't tell "whether this difference is significant." That's why we next calculate the validation uncertainty $u_\text{val}$ and compare it with $E$.

Validation Uncertainty $u_\text{val}$

🧑‍🎓

What specific uncertainties are combined to get the validation uncertainty?

🎓

ASME V&V 20 broadly categorizes uncertainties into three categories:

$u_\text{num}$ (Numerical uncertainty): Uncertainty arising from discretization, such as mesh dependency, time step dependency, iterative convergence residuals. Often evaluated using GCI (Grid Convergence Index).
$u_\text{input}$ (Input parameter uncertainty): Measurement accuracy or variability of material properties, boundary conditions, initial conditions. For example, if Young's modulus is $210 \pm 5$ GPa, that $\pm5$ GPa is the input uncertainty.
$u_\text{exp}$ (Experimental uncertainty): Sensor accuracy, measurement location deviation, environmental fluctuations, etc. — uncertainty on the experimental data side.

If these can be assumed to be mutually independent, they are combined using RSS (Root Sum of Squares):

$$ u_\text{val} = \sqrt{u_\text{num}^2 + u_\text{input}^2 + u_\text{exp}^2} $$

🧑‍🎓

The square root of the sum of squares... that's the same concept used in error propagation.

🎓

Exactly, it's the same concept as GUM (Guide to the Expression of Uncertainty in Measurement). However, there are two points to note. First, if the uncertainty sources are correlated, correlation terms need to be added. Second, the uncertainty here is generally treated as standard uncertainty ($k=1$, approximately 68% confidence interval), but sometimes it's treated with a 95% confidence interval ($k=2$), so you must always specify which one you are using in your report.

Specific Evaluation Methods for Each Uncertainty Component

Evaluation of $u_\text{num}$: Obtain solutions on three or more mesh levels and apply Richardson extrapolation. GCI = $F_s \cdot |(\hat{f}_\text{fine} - \hat{f}_\text{coarse})/\hat{f}_\text{fine}| / (r^p - 1)$. The safety factor $F_s$ is typically 1.25 (for three-level extrapolation) or 3.0 (for two-level).
Evaluation of $u_\text{input}$: Evaluate using the sensitivity coefficient method ($u_\text{input}^2 = \sum_i (\partial S/\partial x_i)^2 u_{x_i}^2$) or the Monte Carlo method. For many parameters, Latin Hypercube Sampling is efficient.
Evaluation of $u_\text{exp}$: Combine Type A (statistical evaluation: standard deviation of repeated measurements) and Type B (systematic evaluation: calibration certificates, sensor specifications) according to GUM.

Judgment Criteria: $|E|$ vs $u_\text{val}$

🧑‍🎓

Once you have $E$ and $u_\text{val}$, how do you decide "pass/fail"?

🎓

The judgment is simple. The basic idea of ASME V&V 20 is this:

If $|E| \leq u_\text{val}$: The comparison error is within the range of uncertainty. Under this condition, it cannot be distinguished whether the difference between simulation and experiment is due to model error or uncertainty. Validation can be judged as successful.
If $|E| > u_\text{val}$: There is a difference that cannot be explained by uncertainty. That means a model form error exists. Model improvement is needed.

🧑‍🎓

Wait a minute. Doesn't $|E| \leq u_\text{val}$ mean that if the uncertainty is large, anything can "pass"?

🎓

Sharp observation! That's exactly right, and it's a key point in the V&V 20 philosophy. $|E| \leq u_\text{val}$ is not proof that "the simulation is correct"; it means "with the current level of knowledge (the magnitude of uncertainty), the existence of model error cannot be detected".

Therefore, the quality of validation is determined by the smallness of $u_\text{val}$. To reduce uncertainty, constant effort is required to use more precise experiments, finer meshes, and more accurate material properties.

💡 Intuitive Understanding: Using a shooting target as an analogy, $E$ is "how far the bullet is from the center," and $u_\text{val}$ is the "size of the target." If the bullet hits the target ($|E| \leq u_\text{val}$), the shooter passes. But if the target is 1 meter in diameter, anyone can hit it. To demonstrate skill (model accuracy), you need to hit a small target.

Area Metric

🧑‍🎓

The comparison of $E$ and $u_\text{val}$ is a point-by-point evaluation, right? What about when the response has a distribution? For example, the agreement of an entire temperature distribution.

🎓

Good point. The method used in such cases is the Area Validation Metric. Proposed by Ferson et al. (2008), it compares the cumulative distribution functions (CDFs) of the simulation results and experimental data.

$$ d_\text{area} = \int_{-\infty}^{\infty} |F_S(y) - F_D(y)| \, dy $$

Here, $F_S(y)$ is the CDF of the simulation results, and $F_D(y)$ is the CDF of the experimental data.

🎓

$d_\text{area}$ takes a value greater than or equal to 0, and if the two CDFs match perfectly, $d_\text{area} = 0$. The advantages of this metric are:

It reflects the shape of the distribution (not just the mean, but also spread and skewness).
Multiple response quantities can be summarized into a single number.
It's easy to set a threshold (e.g., pass if $d_\text{area} < 0.1$).

For example, when comparing acceleration time histories in a vehicle crash test, you can treat values at each time point as samples, create a CDF, and evaluate them collectively using the area metric.

🧑‍🎓

The area difference between CDFs, I see. That gives a better overall picture than point-by-point comparison.

u-pooling Method

🧑‍🎓

What about when there are multiple validation points—for example, 10 measurement locations or experiments under 5 conditions? How do you evaluate them comprehensively?

🎓

The method to address that is the u-pooling method (Ferson & Oberkampf, 2009). The idea is this:

For each validation point $i$, construct the CDF $F_{D_i}$ of the experimental data $D_i$.
Determine where the simulation value $S_i$ lies on that CDF: $u_i = F_{D_i}(S_i)$.
If the model were perfect, $u_i$ should follow a uniform distribution on $[0, 1]$.
Compare the CDF of the collected $\{u_1, u_2, \ldots, u_n\}$ with the CDF of the standard uniform distribution $U[0,1]$.

$$ u_i = F_{D_i}(S_i), \quad i = 1, 2, \ldots, n $$

🎓

If this $u_i$ deviates significantly from a uniform distribution, it's evidence of a systematic bias in the model. For example, if $u_i$ is consistently biased towards 0, it indicates "the simulation is systematically underestimating"; if biased towards 1, it indicates "systematic overestimation."

In practice, statistical tests like the Kolmogorov-Smirnov test (KS test) or Anderson-Darling test are often used to test the degree of deviation from the uniform distribution.

🧑‍🎓

So, you can combine results from disparate conditions and say "how is this model overall?" That's an incredibly useful technique.

Reliability Metric

As another form of validation metric, there is the Reliability Metric:

$$ r = P(|S - D| \leq \delta_\text{req}) $$

Here, $\delta_\text{req}$ is the engineering requirement (allowable error), and $r$ represents the "probability that the simulation meets the requirement." This allows for reliability-based judgment when either or both $S$ and $D$ have probability distributions. It is effective in fields like aerospace where there are strict requirements such as "failure probability below $10^{-6}$."

Numerical Methods and Implementation

Uncertainty Quantification Methods

🧑‍🎓

I understand the theory. But in practice, how do you actually get numerical values for $u_\text{num}$ and $u_\text{input}$? Is there a proper procedure?

🎓

For $u_\text{num}$, the most widely used method is Richardson extrapolation + GCI. Let's outline the procedure:

Solve the same problem on three or more mesh densities (coarse, medium, fine). Let the representative mesh sizes be $h_1 > h_2 > h_3$.
Set the refinement ratio $r = h_1 / h_2$ (typically $r \approx 1.3\sim2.0$).
Determine the apparent order of convergence $p$:

$$ p = \frac{\ln\!\left(\frac{f_1 - f_2}{f_2 - f_3}\right)}{\ln(r)} $$

🎓

Here, $f_1, f_2, f_3$ are the solution values on the coarse, medium, and fine meshes, respectively. Then calculate the GCI (Grid Convergence Index):

$$ \text{GCI}_\text{fine} = \frac{F_s}{r^p - 1} \left| \frac{f_2 - f_3}{f_3} \right| $$

The safety factor $F_s = 1.25$ (for three-level extrapolation). This $\text{GCI}_\text{fine}$ becomes the estimate for the numerical uncertainty $u_\text{num}$.

🧑‍🎓

So you solve on three meshes and estimate "how far from the true solution" based on the convergence behavior.

Combination with Monte Carlo Method

🧑‍🎓

What about $u_\text{input}$? If there are 10 or 20 input parameters, calculating all the sensitivity coefficients seems tough...

🎓

Exactly, the sensitivity coefficient method becomes computationally expensive with many parameters. That's where the Monte Carlo method comes in. The procedure is:

Set probability distributions (normal, uniform, etc.) for each input parameter.
Generate $N$ sample sets (e.g., $N = 100 \sim 10{,}000$) using methods like Latin Hypercube Sampling (LHS).
Run the simulation for each sample to obtain the distribution of the response.
The standard deviation of the response becomes $u_\text{input}$.

However, running $N=1000$ simulations is not realistic for CFD where a single simulation takes hours. Therefore, a practical approach is to insert a surrogate model (response surface method, Kriging, PCE: Polynomial Chaos Expansion) to reduce computational cost.

Contribution Decomposition via Sensitivity Analysis

🧑‍🎓

When $u_\text{val}$ is too large to be useful, I'd want to know "which uncertainty to reduce for the best effect."

🎓

That's where uncertainty contribution decomposition (Uncertainty Budget) becomes important. Look at the ratio of each term in $u_\text{val}^2 = u_\text{num}^2 + u_\text{input}^2 + u_\text{exp}^2$.

For example, in an actual project where $u_\text{val} = 12.5$ MPa:

Uncertainty Component	Value [MPa]	Squared Contribution Rate	Countermeasure
$u_\text{num}$	3.2	6.6%	Mesh refinement
$u_\text{input}$	11.0	77.4%	Improve material test accuracy
$u_\text{exp}$	5.0	16.0%	Calibrate measurement system
$u_\text{val}$ (combined)	12.5	100%	—

🎓

In this example, $u_\text{input}$ is dominant (77.4%), so refining the mesh would hardly reduce the overall uncertainty. You can see that improving the accuracy of material property values is the most cost-effective. Being able to make such judgments is a strength of the V&V 20 framework.

Practical Guide

How to Plan Validation

🧑‍🎓

Professor, when actually doing validation, what should I do first? Should I just start by calculating $E = S - D$?

🎓

That's a common beginner mistake. First, you must create a validation plan. ASME V&V 20 recommends using a "PIRT (Phenomena Identification and Ranking Table)" and proceeding in the following order:

Decide on the SRQ (System Response Quantity) for validation: What will you compare? Maximum stress? Temperature distribution? Flow velocity profile? Vague "does the result match?" is not enough. Be specific about the response quantity.
Set the required accuracy criteria: Determine $\delta_\text{req}$ (allowable error). Without this, you cannot judge "pass."
Develop an experimental plan: Over what parameter range, how many measurements? Choose measurement methods that can sufficiently reduce uncertainty.
Complete code verification (Verification) first: Complete code verification and solution verification before validation. Validation is meaningless if there are numerical bugs.