CAE Simulation Regression Testing — Quality Assurance Strategy for Solver Updates and Model Changes
Theory and Background
What is CAE Regression Testing?
Professor, regression testing is a term from software development, right? Is it also used in the CAE world?
Good question. In software development, regression testing is "checking if other functions are broken after modifying code," but the exact same concept is necessary in CAE.
Regression testing in CAE refers to the process of automatically comparing new results with past results certified as "correct" (baseline) and verifying whether the differences are within an acceptable range, when any changes are made to the analysis environment, such as solver version upgrades, model modifications, mesh regeneration, or OS/compiler updates.
Huh, so it's systematically confirming that "even though we made changes, we still get the same results as before," right?
Exactly. For example, imagine you're doing crash analysis at an automotive company. When upgrading LS-DYNA from version R12 to R13, you need to check if the acceleration peak values or intrusion amounts of previously certified models have changed—it's unrealistic to check this manually for every single model when you have 100 or 200 of them.
Indeed. If you tried to do it all manually, a week probably wouldn't be enough...
That's why we automate it. Automatically run a suite of benchmark problems, automatically compare results, and issue an alert if differences exceed a threshold—that's the overall picture of CAE regression testing.
Why is Regression Testing Necessary?
But solvers are commercial software, right? Do results really change just from a version upgrade?
They do change, that's the thing. Solver vendors perform bug fixes, algorithm improvements, and default setting changes during version upgrades. Each of these can potentially affect results.
Here are some actual examples:
- MSC Nastran version change caused natural frequency variations of 0.3–0.5% (due to changes in stiffness matrix assembly algorithm).
- Abaqus minor update changed the default contact detection, altering contact forces by a few percent.
- OpenFOAM version change corrected the wall function implementation in the turbulence model, changing Cd values by over 2%.
- Changing compiler optimization flags alone altered the order of floating-point rounding, causing bifurcation in the convergence path of a nonlinear analysis.
What, results change just from compiler optimization flags!? That's a bit scary...
That's precisely why regression testing is essential. In fields involving safety certification like aerospace and nuclear power, running all certified benchmark suites every time the solver version changes is standard practice. In automotive crash safety, consistency with NCAP standards must always be verified.
Mathematical Framework for Judgment
How do you specifically judge "whether results have changed"? Numerical values never match exactly, right?
Exactly. Due to floating-point arithmetic, we don't expect exact matches. Instead, we judge based on whether the relative error or absolute error is within an acceptable range. The basic formula is:
Here, $Y_{baseline}$ is the baseline (reference) result, and $Y_{new}$ is the result from the new environment. The judgment criteria are:
When $Y_{baseline}$ is near zero, the relative error diverges, so we use absolute error in that case:
So relative error can't be used when the baseline is near zero. Like displacement at a constrained point (theoretically zero), for example.
Exactly that kind of case. In practice, a combined judgment using both relative and absolute error is often used:
Furthermore, the pass rate for the entire test suite is also managed:
Is there also a standard for the overall test suite pass rate?
Many organizations set a Pass Rate of 100% as a release condition. If even one case FAILs, they investigate the cause to determine if it's a change due to "intentional improvement" or a "bug." If it's an intentional change, they update the baseline.
Impact of Solver Version Changes
Could you tell me more specifically what kind of impacts are likely to occur with version changes for each solver?
| Solver | Changes Most Likely to Have Impact | Typical Magnitude of Difference | Points to Note |
|---|---|---|---|
| MSC Nastran | Changes in numerical integration accuracy of stiffness matrix | Natural frequency 0.1–0.5% | Particularly careful with SOL 103/111 |
| Abaqus | Changes in default contact algorithm | Contact force 1–5% | Must read "Changed defaults" in Release Notes |
| Ansys Mechanical | Changes in behavior of internal APDL commands | Stress 0.5–2% | Often hidden in Workbench |
| OpenFOAM | Corrections to turbulence model/wall function implementation | Cd/Cl 1–3% | Differences between v.com and Foundation edition |
| LS-DYNA | Changes in default hourglass control | Acceleration peak 2–10% | Major impact in crash analysis |
LS-DYNA's acceleration peak changing by up to 10%... that's a huge problem for crash safety, isn't it?
It's a huge problem. That's exactly why in automotive CAE departments, it's common sense to run hundreds of crash models through regression testing during a solver major upgrade. Cutting corners is not allowed in areas involving human lives.
Scope and Limitations of Regression Testing
- Scope: Verification of changes to known analysis settings, such as solver version changes, OS/compiler changes, mesh regeneration, material model corrections, minor boundary condition modifications.
- Limitations: Regression testing only verifies "consistency with previous results" and does not guarantee the "correctness" of results. If the baseline itself is wrong, the regression test will pass even though the analysis results are inaccurate. Separate V&V (Verification and Validation) is required.
- Coverage Limitations: Regression testing cannot detect issues related to physical phenomena or analysis conditions not covered by the test suite. Test case design determines quality.
Dimensional Analysis and Unit Systems
| Variable | SI Unit | Points to Note |
|---|---|---|
| Tolerance $\varepsilon_{rel}$ | — (dimensionless, %) | Set different values for each type of physical quantity |
| Tolerance $\varepsilon_{abs}$ | Unit of each physical quantity | Used for judging values near zero. Set according to scale |
| Test Pass Rate | — (dimensionless, %) | Depends on organization's release criteria (typically 100%) |
Specific Example of Regression Test Judgment
Example of regression test results when migrating from Nastran v2024 to v2025. Tolerances are set per case.
| Evaluation Item | v2024 (Baseline) | v2025 (New Version) | Relative Error [%] | Judgment |
|---|---|---|---|---|
| Cantilever Beam Max Displacement | 5.234 mm | 5.231 mm | 0.06 | PASS |
| Plate 1st Natural Frequency | 142.35 Hz | 141.89 Hz | 0.32 | PASS |
| Bolt Contact Force | 12,450 N | 12,870 N | 3.37 | FAIL |
| Overall Energy Conservation | 99.97% | 99.96% | 0.01 | PASS |
| Max von Mises Stress | 325.8 MPa | 326.1 MPa | 0.09 | PASS |
Judgment Criteria: Relative Error < 1%: ■ Good, 1–3%: ■ Warning, > 3%: ■ FAIL (Cause investigation mandatory)
Baseline Management and Tolerance Design
Baseline Result Management
What exactly is "baseline" data, and how is it managed? Like saving it in Excel on a file server?
Absolutely do not manage it with Excel (laughs). The golden rule of baseline management is "store it in a version-controlled repository in a machine-readable format."
Specifically, structure it like this:
- Store baseline JSON or CSV in a Git repository.
- Attach metadata to each baseline: solver name/version, execution date/time, mesh size, OS/compiler information.
- Result values are extracted scalar results (max displacement, max stress, natural frequency, total reaction force, etc.).
- If necessary, also store field data (node displacement distribution, stress contours, etc.) in binary format.
I see. So instead of saving the solver output files (.f06, .odb, etc.) as-is, you extract the necessary numerical values from them and save those.
Yes. Solver output files can be several GB in size. Saving extracted scalar values in JSON format allows for efficient Git diff display and easy mechanical comparison. For example, the format would look like this:
{
"model": "cantilever_beam_001",
"solver": "MSC Nastran",
"solver_version": "2024.1",
"date": "2026-01-15",
"os": "RHEL 8.9",
"results": {
"max_displacement_mm": 5.234,
"max_vonmises_mpa": 325.8,
"freq_mode1_hz": 142.35,
"reaction_force_n": 1000.02,
"energy_balance_pct": 99.97
}
}
Tolerance Criteria Design
How do you set the tolerance? Wouldn't it be simpler to set everything uniformly to, say, 0.1%?
Setting it uniformly causes problems. That's because numerical sensitivity is completely different for each physical quantity. Tolerance needs to be varied based on the type of physical quantity and how its value affects design decisions.
| Physical Quantity | Recommended Tolerance $\varepsilon_{rel}$ | Reason |
|---|---|---|
| Displacement (global) | 0.1–0.5% | Stiffness-dominated and stable. Fluctuations are small. |
| Stress (maximum value) | 1–5% | High mesh dependency. Fluctuates significantly near singular points. |
| Natural Frequency | 0.5–2% | Determined by mass/stiffness balance. Relatively stable. |
| Total Reaction Force | 0.01–0.1% | Determined by force equilibrium, very stable. |
| Contact Force | 2–10% | Strongly dependent on contact algorithm. Large differences between solvers. |
| CFD Drag Coefficient Cd | 1–3% | Dependent on turbulence model and mesh. |
| Temperature (maximum value) | 0.5–2% | Heat conduction is relatively stable but fluctuates due to convection terms. |
Stress and contact force have wider tolerances. Indeed, stress values change a lot just by changing the mesh.
That intuition is correct. Another important point is that quantities directly linked to safety factors require stricter criteria. For example, stress values used for fatigue life evaluation need stricter criteria than other physical quantities.
Concept of Multi-Stage Judgment
With a simple PASS/FAIL binary choice, doesn't everything fail if it exceeds the threshold even slightly?
Good observation. In practice, three-stage judgment is common:
- GREEN (PASS): $e_{rel} < \varepsilon_1$ — No problem. Automatically approved.
- AMBER (WARNING): $\varepsilon_1 \leq e_{rel} < \varepsilon_2$ — Requires confirmation. Judgment after engineer's review.
Related Topics
なった
詳しく
報告