CAE Simulation Regression Testing — Quality Assurance Strategy for Solver Updates and Model Changes

Category: V&V / ベストプラクティス | Integrated 2026-04-12
CAE regression testing pipeline showing baseline comparison and automated validation workflow
CAE回帰テストの全体像 — ベースライン管理からCI/CDパイプラインまで

Theory and Background

What is CAE Regression Testing?

🧑‍🎓

Professor, regression testing is a term from software development, right? Is it also used in the CAE world?

🎓

Good question. In software development, regression testing is "checking if other functions are broken after modifying code," but the exact same concept is necessary in CAE.

🎓

Regression testing in CAE refers to the process of automatically comparing new results with past results certified as "correct" (baseline) and verifying whether the differences are within an acceptable range, when any changes are made to the analysis environment, such as solver version upgrades, model modifications, mesh regeneration, or OS/compiler updates.

🧑‍🎓

Huh, so it's systematically confirming that "even though we made changes, we still get the same results as before," right?

🎓

Exactly. For example, imagine you're doing crash analysis at an automotive company. When upgrading LS-DYNA from version R12 to R13, you need to check if the acceleration peak values or intrusion amounts of previously certified models have changed—it's unrealistic to check this manually for every single model when you have 100 or 200 of them.

🧑‍🎓

Indeed. If you tried to do it all manually, a week probably wouldn't be enough...

🎓

That's why we automate it. Automatically run a suite of benchmark problems, automatically compare results, and issue an alert if differences exceed a threshold—that's the overall picture of CAE regression testing.

Why is Regression Testing Necessary?

🧑‍🎓

But solvers are commercial software, right? Do results really change just from a version upgrade?

🎓

They do change, that's the thing. Solver vendors perform bug fixes, algorithm improvements, and default setting changes during version upgrades. Each of these can potentially affect results.

🎓

Here are some actual examples:

  • MSC Nastran version change caused natural frequency variations of 0.3–0.5% (due to changes in stiffness matrix assembly algorithm).
  • Abaqus minor update changed the default contact detection, altering contact forces by a few percent.
  • OpenFOAM version change corrected the wall function implementation in the turbulence model, changing Cd values by over 2%.
  • Changing compiler optimization flags alone altered the order of floating-point rounding, causing bifurcation in the convergence path of a nonlinear analysis.
🧑‍🎓

What, results change just from compiler optimization flags!? That's a bit scary...

🎓

That's precisely why regression testing is essential. In fields involving safety certification like aerospace and nuclear power, running all certified benchmark suites every time the solver version changes is standard practice. In automotive crash safety, consistency with NCAP standards must always be verified.

Mathematical Framework for Judgment

🧑‍🎓

How do you specifically judge "whether results have changed"? Numerical values never match exactly, right?

🎓

Exactly. Due to floating-point arithmetic, we don't expect exact matches. Instead, we judge based on whether the relative error or absolute error is within an acceptable range. The basic formula is:

$$ e_{rel} = \frac{|Y_{new} - Y_{baseline}|}{|Y_{baseline}|} \times 100\% $$

Here, $Y_{baseline}$ is the baseline (reference) result, and $Y_{new}$ is the result from the new environment. The judgment criteria are:

$$ e_{rel} < \varepsilon_{tol} \implies \text{PASS},\quad e_{rel} \geq \varepsilon_{tol} \implies \text{FAIL} $$
🎓

When $Y_{baseline}$ is near zero, the relative error diverges, so we use absolute error in that case:

$$ e_{abs} = |Y_{new} - Y_{baseline}| < \varepsilon_{abs} $$
🧑‍🎓

So relative error can't be used when the baseline is near zero. Like displacement at a constrained point (theoretically zero), for example.

🎓

Exactly that kind of case. In practice, a combined judgment using both relative and absolute error is often used:

$$ \text{PASS} \iff (e_{rel} < \varepsilon_{rel}) \;\lor\; (e_{abs} < \varepsilon_{abs}) $$

Furthermore, the pass rate for the entire test suite is also managed:

$$ \text{Pass Rate} = \frac{N_{pass}}{N_{total}} \times 100\% $$
🧑‍🎓

Is there also a standard for the overall test suite pass rate?

🎓

Many organizations set a Pass Rate of 100% as a release condition. If even one case FAILs, they investigate the cause to determine if it's a change due to "intentional improvement" or a "bug." If it's an intentional change, they update the baseline.

Impact of Solver Version Changes

🧑‍🎓

Could you tell me more specifically what kind of impacts are likely to occur with version changes for each solver?

🎓
SolverChanges Most Likely to Have ImpactTypical Magnitude of DifferencePoints to Note
MSC NastranChanges in numerical integration accuracy of stiffness matrixNatural frequency 0.1–0.5%Particularly careful with SOL 103/111
AbaqusChanges in default contact algorithmContact force 1–5%Must read "Changed defaults" in Release Notes
Ansys MechanicalChanges in behavior of internal APDL commandsStress 0.5–2%Often hidden in Workbench
OpenFOAMCorrections to turbulence model/wall function implementationCd/Cl 1–3%Differences between v.com and Foundation edition
LS-DYNAChanges in default hourglass controlAcceleration peak 2–10%Major impact in crash analysis
🧑‍🎓

LS-DYNA's acceleration peak changing by up to 10%... that's a huge problem for crash safety, isn't it?

🎓

It's a huge problem. That's exactly why in automotive CAE departments, it's common sense to run hundreds of crash models through regression testing during a solver major upgrade. Cutting corners is not allowed in areas involving human lives.

Scope and Limitations of Regression Testing
  • Scope: Verification of changes to known analysis settings, such as solver version changes, OS/compiler changes, mesh regeneration, material model corrections, minor boundary condition modifications.
  • Limitations: Regression testing only verifies "consistency with previous results" and does not guarantee the "correctness" of results. If the baseline itself is wrong, the regression test will pass even though the analysis results are inaccurate. Separate V&V (Verification and Validation) is required.
  • Coverage Limitations: Regression testing cannot detect issues related to physical phenomena or analysis conditions not covered by the test suite. Test case design determines quality.
Dimensional Analysis and Unit Systems
VariableSI UnitPoints to Note
Tolerance $\varepsilon_{rel}$— (dimensionless, %)Set different values for each type of physical quantity
Tolerance $\varepsilon_{abs}$Unit of each physical quantityUsed for judging values near zero. Set according to scale
Test Pass Rate— (dimensionless, %)Depends on organization's release criteria (typically 100%)

Specific Example of Regression Test Judgment

Example of regression test results when migrating from Nastran v2024 to v2025. Tolerances are set per case.

Evaluation Itemv2024 (Baseline)v2025 (New Version)Relative Error [%]Judgment
Cantilever Beam Max Displacement5.234 mm5.231 mm
0.06
PASS
Plate 1st Natural Frequency142.35 Hz141.89 Hz
0.32
PASS
Bolt Contact Force12,450 N12,870 N
3.37
FAIL
Overall Energy Conservation99.97%99.96%
0.01
PASS
Max von Mises Stress325.8 MPa326.1 MPa
0.09
PASS

Judgment Criteria: Relative Error < 1%: Good, 1–3%: Warning, > 3%: FAIL (Cause investigation mandatory)

Baseline Management and Tolerance Design

Baseline Result Management

🧑‍🎓

What exactly is "baseline" data, and how is it managed? Like saving it in Excel on a file server?

🎓

Absolutely do not manage it with Excel (laughs). The golden rule of baseline management is "store it in a version-controlled repository in a machine-readable format."

🎓

Specifically, structure it like this:

  • Store baseline JSON or CSV in a Git repository.
  • Attach metadata to each baseline: solver name/version, execution date/time, mesh size, OS/compiler information.
  • Result values are extracted scalar results (max displacement, max stress, natural frequency, total reaction force, etc.).
  • If necessary, also store field data (node displacement distribution, stress contours, etc.) in binary format.
🧑‍🎓

I see. So instead of saving the solver output files (.f06, .odb, etc.) as-is, you extract the necessary numerical values from them and save those.

🎓

Yes. Solver output files can be several GB in size. Saving extracted scalar values in JSON format allows for efficient Git diff display and easy mechanical comparison. For example, the format would look like this:

{
  "model": "cantilever_beam_001",
  "solver": "MSC Nastran",
  "solver_version": "2024.1",
  "date": "2026-01-15",
  "os": "RHEL 8.9",
  "results": {
    "max_displacement_mm": 5.234,
    "max_vonmises_mpa": 325.8,
    "freq_mode1_hz": 142.35,
    "reaction_force_n": 1000.02,
    "energy_balance_pct": 99.97
  }
}

Tolerance Criteria Design

🧑‍🎓

How do you set the tolerance? Wouldn't it be simpler to set everything uniformly to, say, 0.1%?

🎓

Setting it uniformly causes problems. That's because numerical sensitivity is completely different for each physical quantity. Tolerance needs to be varied based on the type of physical quantity and how its value affects design decisions.

🎓
Physical QuantityRecommended Tolerance $\varepsilon_{rel}$Reason
Displacement (global)0.1–0.5%Stiffness-dominated and stable. Fluctuations are small.
Stress (maximum value)1–5%High mesh dependency. Fluctuates significantly near singular points.
Natural Frequency0.5–2%Determined by mass/stiffness balance. Relatively stable.
Total Reaction Force0.01–0.1%Determined by force equilibrium, very stable.
Contact Force2–10%Strongly dependent on contact algorithm. Large differences between solvers.
CFD Drag Coefficient Cd1–3%Dependent on turbulence model and mesh.
Temperature (maximum value)0.5–2%Heat conduction is relatively stable but fluctuates due to convection terms.
🧑‍🎓

Stress and contact force have wider tolerances. Indeed, stress values change a lot just by changing the mesh.

🎓

That intuition is correct. Another important point is that quantities directly linked to safety factors require stricter criteria. For example, stress values used for fatigue life evaluation need stricter criteria than other physical quantities.

Concept of Multi-Stage Judgment

🧑‍🎓

With a simple PASS/FAIL binary choice, doesn't everything fail if it exceeds the threshold even slightly?

🎓

Good observation. In practice, three-stage judgment is common: