Theory Note
Backpropagation computes the gradient of loss L w.r.t. weight w via the chain rule:
∂L/∂w = ∂L/∂a · ∂a/∂z · ∂z/∂w
XOR is linearly non-separable and can only be solved by a network with at least one hidden layer.
Visualize neural network structure, forward propagation, and backpropagation in real time. Learn how neural nets work by training on the XOR problem.
Backpropagation computes the gradient of loss L w.r.t. weight w via the chain rule:
∂L/∂w = ∂L/∂a · ∂a/∂z · ∂z/∂w
XOR is linearly non-separable and can only be solved by a network with at least one hidden layer.
The core of a neuron's operation is the weighted sum of its inputs, passed through a non-linear activation function (like sigmoid or ReLU). This is the forward propagation step for a single neuron:
$$a^{(l)}_j = \sigma\left( z^{(l)}_j \right) = \sigma\left( \sum_{k}w^{(l)}_{jk}a^{(l-1)}_k + b^{(l)}_j \right)$$Here, $a^{(l)}_j$ is the activation of neuron $j$ in layer $l$, $\sigma$ is the activation function, $w^{(l)}_{jk}$ is the weight connecting neuron $k$ in layer $(l-1)$ to neuron $j$ in layer $l$, and $b^{(l)}_j$ is the bias. This calculation propagates from input to output.
Learning is governed by backpropagation, which uses the chain rule to compute gradients of a loss function $L$ (e.g., MSE) with respect to every weight and bias. The key gradient for a weight is:
$$\frac{\partial L}{\partial w^{(l)}_{jk}}= \delta^{(l)}_j \cdot a^{(l-1)}_k$$Where $\delta^{(l)}_j = \frac{\partial L}{\partial z^{(l)}_j}$ is the "error" term for neuron $j$ in layer $l$. This error is propagated backwards from the output layer. The weight is then updated as $w \leftarrow w - \eta \frac{\partial L}{\partial w}$, where $\eta$ is the learning rate you control in the simulator.
Surrogate Models in CAE: Running high-fidelity Finite Element Analysis (FEA) or Computational Fluid Dynamics (CFD) simulations can take days. A neural network can be trained on a dataset of these simulation results to create a "surrogate" model that predicts outcomes in milliseconds, enabling rapid design exploration and optimization.
Physics-Informed Neural Networks (PINNs): This is a cutting-edge CAE application. Instead of just learning from data, PINNs embed the governing physical equations (like Navier-Stokes or heat equations) directly into the loss function. This guides the network to learn solutions that are physically plausible, even with sparse data, and can solve inverse problems.
Autonomous System Control: Neural networks process sensor data (camera, LiDAR) from vehicles or robots to make real-time decisions like steering, braking, or path planning. They learn complex mappings from high-dimensional inputs to control outputs that are difficult to program with traditional logic.
Material Property Prediction: In materials science and engineering, networks predict properties like strength, thermal conductivity, or fatigue life based on microstructural images or composition data. This accelerates the discovery of new alloys, composites, and polymers for specific engineering applications.
While experimenting with this tool, you might encounter a few easily misunderstood points. First, you might think "a larger learning rate η leads to faster learning." This is only half true. While increasing the value does enlarge the weight update steps, setting η to something like 0.5 or 1.0 can cause the loss curve to oscillate wildly, potentially preventing convergence to an optimal solution. This is akin to overshooting the valley floor, landing on the opposite slope, and overshooting again in a repeating cycle. In practice, the golden rule is to start with small values like 0.01 or 0.001 and adjust while monitoring progress.
Next is the misconception that "more hidden layers and nodes always improve performance." For this XOR problem, one hidden layer with two nodes is sufficient. But try deliberately creating a huge network in the tool, say 5 layers with 10 nodes each. You'll see the loss seemingly approach zero, but the decision boundary becomes overly complex, merely overfitting to the four training data points. This is overfitting. For real-world problems, aiming for a simple model robust to unseen data is crucial, and the depth and width of layers must be carefully decided as hyperparameters.
Finally, the idea that "the sigmoid function is a universal, all-purpose activation function." While historically significant, it has a major weakness in deep networks. Because the sigmoid's output is squeezed between 0 and 1, it's prone to the vanishing gradient problem, where gradients become increasingly smaller as they propagate backward through layers. If you feel learning slow down when deepening layers in this tool, that's a basic experience of vanishing gradients. This is one reason why functions like ReLU ($f(x)=max(0, x)$) have become mainstream in modern deep learning practice.
The concepts of "forward propagation" and "error backpropagation" you learned with this tool are fundamentally connected to the core engineering challenges of "modeling" and "parameter identification" found across various fields, beyond CAE.
For example, in Structural Health Monitoring (SHM), invisible damage (cracks, loose bolts) in bridges or structures is detected from vibration data collected by sensors. Here, normal condition data serves as the "input" and damage patterns as the "output" for training a neural network, formulating it as a "classification/regression problem" to estimate damage presence or location from new data. It's precisely like drawing the decision boundary that separated 0/1 in the XOR problem, but in a much higher-dimensional data space.
Furthermore, in the field of Materials Informatics, alloy composition ratios (e.g., Iron 90%, Chromium 8%, Carbon 2%) or heat treatment conditions are modeled as the "input," with resulting strength or corrosion resistance as the "output." The goal is to inversely search for the optimal material recipe from limited, high-cost data obtained through experiments or simulations. This process is essentially the same as adjusting weights and biases via "backpropagation" to approach a target output. Moreover, all engineering challenges involving unstructured data, such as image-based automated inspection and acoustic signal-based anomaly detection, are potential application areas for the multilayer perceptron concepts you touched on with this tool.
Once you've developed an intuitive feel for how neural networks work with this tool, it's recommended to delve deeper from both theoretical and implementation perspectives. As a learning path, first solidify your foundations in linear algebra and calculus. The chain rule, central to backpropagation, becomes much clearer when calculated all at once with vectors and matrices ($ \boldsymbol{\delta}^{(l)} = ( (\boldsymbol{W}^{(l+1)})^T \boldsymbol{\delta}^{(l+1)}) \odot \sigma'(\boldsymbol{z}^{(l)})$). Familiarizing yourself with vector dot products, matrix multiplication and transposition, and element-wise products (the Hadamard product $\odot$) is the next key step.
Building on that, learning about activation functions other than "sigmoid" (like ReLU, tanh) and loss functions other than "squared error" (like cross-entropy) will help you understand why they are used. For instance, the softmax function used in the output layer for multi-class classification is like a multi-dimensional version of the sigmoid.
Ultimately, aim to understand advanced architectures like Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN). CNNs learn "filters (a type of weight)" to extract local features from images, directly applicable to analyzing CAE mesh data or field visualization images. The intuition of "stacking layers to transform features" cultivated with this tool should provide a solid foundation for understanding these more complex and powerful models.