What is data-driven multiphysics?

It is an approach that uses machine learning (surrogate models or PINNs) to rapidly predict interactions between physical fields, replacing traditional full FEM/CFD coupled analysis. It proves highly effective in scenarios where computational cost is a bottleneck, such as design exploration and real-time control.

What is the difference between a surrogate model and a PINN?

A surrogate model is a black-box approach that learns input-output relationships from full simulation results, with Gaussian processes and neural networks being typical examples. A Physics-Informed Neural Network (PINN) incorporates physical laws (governing equations) into its loss function, enabling physically consistent predictions even with limited data.

What is Active Learning DOE?

It is an adaptive Design of Experiments method that automatically identifies regions where a surrogate model's prediction uncertainty is high and concentrates additional high-fidelity simulations there. Compared to random DOE, it can reduce the number of samples required to achieve the same accuracy by 50-80%.

How is transfer learning used in CAE?

It is a technique that reuses a model pre-trained on similar shapes or similar physics for a new problem. For example, a model trained on 2D cross-sectional fluid analysis can be transferred to provide an initial estimate for a 3D problem, significantly reducing the required training data volume and convergence time.

Data-Driven Multiphysics

Category: 連成解析 > マルチフィジックス | 更新 2026-04-12

データ駆動型マルチフィジックスの全体像：フルシミュレーションからサロゲート構築、能動学習による効率的設計探索までのワークフロー

Theory and Physics

Overview — Why Data-Driven is Needed

🧑‍🎓

Professor, does data-driven multiphysics essentially mean solving physics with AI?

🎓

Half right, half wrong. The key point isn't "replacing the entire physics simulation with AI," but rather "letting AI approximate in situations where physics simulations are too slow."

🧑‍🎓

Too slow? How slow specifically?

🎓

For example, a single automotive crash analysis case takes 8 to 12 hours. If you try to simultaneously optimize that with NVH (vibration/noise) and durability analysis, you need to couple three physics and run thousands of cases. Full FEM would take years to compute.

🧑‍🎓

Years! That wouldn't meet product development deadlines...

🎓

Exactly. That's why the data-driven approach emerged. Broadly speaking, there are two main approaches:

(1) Surrogate Model — Use full FEM/CFD calculation results as training data to approximate the input-output relationship with neural networks or Gaussian Processes (GP). After training, predictions are possible in seconds per case.
(2) PINN (Physics-Informed Neural Network) — Incorporate physical laws (residuals of governing equations) into the loss function, enabling physically consistent predictions even with limited data.

🧑‍🎓

Where is this actually being used?

🎓

In multi-objective optimization for automotive OEMs involving crash-NVH-durability, GP-based surrogates have been in practical use since around 2020. For thermal-structural coupling in aero-engine turbine blades, NASA and Rolls-Royce are using multi-fidelity surrogates.

Basics of Surrogate Models

🧑‍🎓

So, a surrogate model is essentially an "approximation formula"? How is it different from Response Surface Methodology (RSM)?

🎓

Good question. RSM approximates with a 2nd-order polynomial, so its accuracy is insufficient when the input-output relationship is complex and nonlinear. Surrogate models are a general term for "proxy models" that approximate full simulation input-output with more flexible functions.

The general formulation of a surrogate model is as follows. For input parameters $\mathbf{x} \in \mathbb{R}^d$ (design variables, boundary conditions), we build a model $\hat{f}(\mathbf{x})$ that approximates the output $\mathbf{y} = f(\mathbf{x})$ of a high-fidelity simulation:

$$ \hat{f}(\mathbf{x}) \approx f(\mathbf{x}) \quad \text{s.t.} \quad \|\hat{f} - f\|_{\mathcal{X}} < \epsilon $$

Comparison of major surrogate modeling techniques:

Method	Characteristics	Uncertainty Estimation	Training Data Amount	Application Scenarios
Gaussian Process Regression (GPR/Kriging)	Expresses correlation via kernel function	Naturally obtained	Small (50~500)	Design optimization, active learning
Neural Network (DNN)	Strong for high-dimensional input/output	MC dropout, etc.	Medium~Large (1000+)	Image-like field prediction
RBF (Radial Basis Function)	Easy to implement	None	Small~Medium	Smooth responses
Random Forest / XGBoost	Robust, interpretable	Ensemble variance	Medium	Mixed classification/regression problems

Gaussian Process Regression (GPR)

🧑‍🎓

The name "Gaussian Process" sounds difficult already... What is it intuitively?

🎓

Roughly speaking, it's an "infinite-dimensional normal distribution." It draws a smooth curve passing through known data points while telling you "I'm not confident here" with uncertainty in areas without data. This uncertainty is extremely useful in active learning.

In GPR, for observed data $\mathcal{D} = \{(\mathbf{x}_i, y_i)\}_{i=1}^n$, the predictive mean $\mu_*$ and variance $\sigma_*^2$ of the posterior distribution are obtained in closed form:

$$ \mu_* = \mathbf{k}_*^T (K + \sigma_n^2 I)^{-1} \mathbf{y} $$

$$ \sigma_*^2 = k_{**} - \mathbf{k}_*^T (K + \sigma_n^2 I)^{-1} \mathbf{k}_* $$

Here, $K$ is the kernel matrix ($K_{ij} = k(\mathbf{x}_i, \mathbf{x}_j)$), $\mathbf{k}_*$ is the kernel vector between a new input point and the training data, and $\sigma_n^2$ is the observation noise variance.

🧑‍🎓

Which kernel function should I choose?

🎓

Since CAE responses are usually smooth, the Matern-5/2 kernel or RBF (squared exponential) kernel are standard choices. Using an anisotropic kernel (ARD: Automatic Relevance Determination) allows automatic estimation of the importance of each design variable.

Kernel Function Selection Guidelines

Kernel	Formula	Smoothness	Use Case
RBF (SE)	$k(r) = \sigma_f^2 \exp\left(-\frac{r^2}{2l^2}\right)$	Infinitely differentiable	Very smooth responses
Matern-5/2	$k(r) = \sigma_f^2 \left(1 + \frac{\sqrt{5}r}{l} + \frac{5r^2}{3l^2}\right)\exp\left(-\frac{\sqrt{5}r}{l}\right)$	Twice differentiable	Standard choice for CAE
Matern-3/2	$k(r) = \sigma_f^2 \left(1 + \frac{\sqrt{3}r}{l}\right)\exp\left(-\frac{\sqrt{3}r}{l}\right)$	Once differentiable	Slightly rough responses

$r = \|\mathbf{x} - \mathbf{x}'\|$, $l$ is the length scale, $\sigma_f^2$ is the signal variance. Hyperparameters are determined by maximizing the log marginal likelihood.

PINN — Physics-Informed Neural Network

🧑‍🎓

Is PINN a completely different approach from surrogates?

🎓

Fundamentally different. Surrogates learn from "teacher data which is simulation results." PINN incorporates "the governing equations themselves" into the loss function, so it can potentially produce physically correct solutions even with little teacher data—in extreme cases, even zero.

The PINN loss function is a weighted sum of a data consistency term and a physical law (PDE residual) term:

$$ \mathcal{L}_{\text{PINN}} = \underbrace{\frac{1}{N_d}\sum_{i=1}^{N_d} \|\hat{u}(\mathbf{x}_i) - u_i^{\text{obs}}\|^2}_{\text{Data Term}} + \lambda_r \underbrace{\frac{1}{N_r}\sum_{j=1}^{N_r} \|\mathcal{N}[\hat{u}](\mathbf{x}_j)\|^2}_{\text{PDE Residual Term}} + \lambda_b \underbrace{\frac{1}{N_b}\sum_{k=1}^{N_b} \|\mathcal{B}[\hat{u}](\mathbf{x}_k)\|^2}_{\text{Boundary Condition Term}} $$

Here, $\mathcal{N}[\cdot]$ is the differential operator of the governing equation, and $\mathcal{B}[\cdot]$ is the boundary condition operator. Automatic differentiation allows analytical computation of partial derivatives of the network output $\hat{u}$, so a mesh is unnecessary.

🧑‍🎓

No mesh needed? That sounds very attractive... Are there no drawbacks?

🎓

To be honest, there are many drawbacks. First, tuning the weight coefficients $\lambda_r$ and $\lambda_b$ is critical; poor settings can lead to no convergence at all. In multiphysics, scales differ per physical field (temperature in hundreds of K, stress in hundreds of MPa), making balancing loss terms particularly difficult. Also, they often struggle with high-frequency vibrations or solutions with steep gradients, and in many cases, they haven't yet reached FEM accuracy.

🧑‍🎓

So what are PINN's strong suits?

🎓

Inverse problems are its strength. "Identifying material parameters from observed data," "complementing missing experimental data with physical laws"—PINN excels in these scenarios. Also, it's effective for complex geometries where governing equations are known but mesh generation is difficult.

DeepONet — Operator Learning

🧑‍🎓

I also hear the term DeepONet a lot lately. How is it different from PINN?

🎓

PINN learns "the solution to one specific problem." DeepONet learns "the mapping (operator) from an input function to an output function." For example, a mapping like "input any boundary condition, output the temperature field." Once learned, it can instantly predict the field for new boundary conditions.

The DeepONet structure is expressed as the product of a Branch Net (encodes the input function) and a Trunk Net (encodes the output location):

$$ G_\theta(u)(y) = \sum_{k=1}^{p} b_k(u; \theta_b) \cdot t_k(y; \theta_t) + b_0 $$

Here, $u$ is the input function (boundary/initial conditions), $y$ is the output evaluation point, $b_k$ is the Branch Net output, and $t_k$ is the Trunk Net output. Fourier Neural Operator (FNO) is also an operator learning method, but it uses Fourier transforms to efficiently capture spatial periodic structures.

Coffee Break Casual Talk

The "Universal Surrogate" Does Not Exist — The Lesson of the No Free Lunch Theorem

In the world of data-driven modeling, the No Free Lunch theorem, which states "there is no single optimal method for all problems," is dominant. GPR is strongest for low-dimensional (~20 variables) smooth responses, but struggles with scalability beyond 100 dimensions. DNNs are strong in high dimensions but overfit with little training data. PINNs can embed physical laws but require artisanal hyperparameter tuning. In practice, "choosing the method according to the nature of the problem"—this is the fundamental stance of data-driven multiphysics.

Numerical Methods and Implementation

Surrogate Construction Workflow

🧑‍🎓

Please tell me the steps to actually build a surrogate model.

🎓

Five steps:

Problem Definition — Determine input variables (design parameters, material properties, load conditions) and output quantities (max stress, natural frequency, temperature, etc.)
DOE (Design of Experiments) — Generate initial sample points using Latin Hypercube Sampling (LHS) or Sobol sequences. A guideline is $10d$ to $20d$ points for dimension $d$.
Execute High-Fidelity Simulations — Run FEM/CFD calculations for all DOE points and acquire training data.
Model Training — Construct the surrogate using GPR or DNN. Evaluate accuracy via cross-validation.
Validation / Active Learning Loop — Check RMSE/R² with validation data; if accuracy is insufficient, acquire additional samples via active learning.

🧑‍🎓

What is LHS in step 2? Isn't random sampling okay?

🎓

With random sampling, "samples might coincidentally cluster in similar areas." LHS (Latin Hypercube Sampling) divides each variable's range into $n$ equal intervals and ensures exactly one point is taken from each interval. As a result, it uniformly covers the design space. For expensive simulations, wasting even one point is critical, so this is an essential technique.

Active Learning DOE

🧑‍🎓

Does active learning mean AI tells us "where to compute next"?

🎓

Exactly. It uses the GPR predictive variance $\sigma_*^2(\mathbf{x})$ to automatically judge "this region has high uncertainty, so additional samples are needed." The Acquisition Function in Bayesian Optimization does precisely this.

Representative acquisition functions are shown below:

$$ \text{EI}(\mathbf{x}) = \mathbb{E}\left[\max(f_{\min} - \hat{f}(\mathbf{x}), 0)\right] = (f_{\min} - \mu_*)\Phi(z) + \sigma_* \phi(z) $$

Here, $z = (f_{\min} - \mu_*)/\sigma_*$, $\Phi$ and $\phi$ are the CDF and PDF of the standard normal distribution. EI (Expected Improvement) is the expected value of "how much the current best value can be improved," automatically balancing exploration and exploitation.

Acquisition Function	Abbreviation	Characteristics	Use Case in Multiphysics
Expected Improvement	EI	Good balance of exploration/exploitation	Standard choice for single-objective optimization
Lower Confidence Bound	LCB	Exploration degree controlled by parameter $\kappa$	Constrained optimization
Knowledge Gradient	KG	Maximizes information value	Simulations with large noise
Expected HyperVolume Improvement	EHVI	Improvement amount of Pareto front	Multi-objective (e.g., crash-NVH)

🧑‍🎓

How much can active learning reduce computational cost?

🎓

According to papers and practical reports, compared to random DOE, it can often reduce the number of samples needed to achieve the same accuracy by 50-80%. There are actual cases in automotive crash optimization where "a surrogate with sufficient accuracy was built with only 150 full FEM cases." That problem would have required over 500 cases with random DOE.

Transfer Learning

🧑‍🎓

I have a strong image of transfer learning in image recognition. Can it be used in CAE too?

🎓

It's incredibly useful. The concept is simple: "use a model trained on a similar problem as the initial value for a new problem." For example, transfer a crash response surrogate learned for one car model's B-pillar to a new B-pillar after a minor change. Since the shapes are similar, a high-accuracy surrogate can be obtained with a small amount of fine-tuning.

🧑‍🎓

Can it transfer even if the physics changes? Like from structural analysis to thermal analysis?

🎓

That's transfer between different physical fields, so it's a more advanced topic. But in the context of PINN, there is research on transferring the lower layers (feature extraction layers) of a network trained on a 2D problem to a 3D problem. The lower layers learn generic features like "spatial patterns of gradients," so parts of them can be useful even if the physics changes somewhat.

Multi-Fidelity Modeling

🧑‍🎓

Does "multi-fidelity" mean combining coarse and fine meshes?

🎓

Exactly. It's an approach that uses many cheap but inaccurate calculations (Low-Fidelity: coarse-mesh FEM, simplified models, etc.) and a few expensive but accurate calculations (High-Fidelity: fine-mesh FEM), integrating both.

The basic model for Co-Kriging (multi-fidelity Gaussian Process) is:

$$ f_{\text{HF}}(\mathbf{x}) = \rho \cdot f_{\text{LF}}(\mathbf{x}) + \delta(\mathbf{x}) $$

Here, $f_{\text{HF}}$ is the high-fidelity output, $f_{\text{LF}}$ is the low-fidelity output, $\rho$ is a scaling coefficient, and $\delta(\mathbf{x})$ is a GP representing the difference. Even with little high-fidelity data, it enables accurate predictions by leveraging trends from low-fidelity data.

🧑‍🎓

What is concretely used as Low-Fidelity in practice?

🎓

Let me list a few common patterns:

Coarse-mesh FEM (