Neural Network Visualizer Back
Machine Learning & AI

Neural Network Visualizer

Visualize neural network structure, forward propagation, and backpropagation in real time. Learn how neural nets work by training on the XOR problem.

Network Architecture
Input nodes2
Hidden layers2
Nodes per hidden layer4
Output nodes1
Activation Function
Learning rate (η)0.10
Training Controls
0
Epochs
Loss
Accuracy

Theory Note

Backpropagation computes the gradient of loss L w.r.t. weight w via the chain rule:

∂L/∂w = ∂L/∂a · ∂a/∂z · ∂z/∂w

XOR is linearly non-separable and can only be solved by a network with at least one hidden layer.

CAE connection: Neural networks are used as surrogate models to approximate expensive FEM/CFD computations. Physics-Informed Neural Networks (PINNs) embed governing equations directly into the loss function.
Positive weight
Negative weight
High activation
Low activation

Loss Curve

Decision Boundary (XOR)

What is a Neural Network?

🧑‍🎓
What exactly is a neural network trying to do? I see it's solving the XOR problem here, but what's the big picture?
🎓
Basically, it's a function approximator. It learns to map inputs (like the two numbers for XOR) to the correct outputs (0 or 1). The "learning" happens by adjusting thousands of internal knobs—the weights and biases. Try changing the "Hidden layers" slider above. Adding more layers lets it learn more complex patterns, but also makes training trickier.
🧑‍🎓
Wait, really? So the network just starts with random guesses? How does it know which way to adjust those "knobs"?
🎓
Exactly! It starts randomly, which is why the initial output is often wrong. It knows how to adjust via backpropagation. The network makes a prediction (forward pass), compares it to the truth using a loss function (like Mean Squared Error), and then calculates how much each weight contributed to the error, working backwards. That's the "Learning rate (η)" parameter—it controls how big a step it takes when adjusting weights based on that error.
🧑‍🎓
So backpropagation is just the network learning from its mistakes? Why is the XOR problem such a classic example for this?
🎓
In practice, yes! XOR is famous because a single neuron (perceptron) cannot solve it—it's not linearly separable. You need at least one hidden layer to create a combination of decision boundaries. Watch the simulator: with zero hidden layers, it will never learn. Add one hidden layer with a few nodes, and it can find a solution. This demonstrates the essential need for depth in neural networks.

Physical Model & Key Equations

The core of a neuron's operation is the weighted sum of its inputs, passed through a non-linear activation function (like sigmoid or ReLU). This is the forward propagation step for a single neuron:

$$a^{(l)}_j = \sigma\left( z^{(l)}_j \right) = \sigma\left( \sum_{k}w^{(l)}_{jk}a^{(l-1)}_k + b^{(l)}_j \right)$$

Here, $a^{(l)}_j$ is the activation of neuron $j$ in layer $l$, $\sigma$ is the activation function, $w^{(l)}_{jk}$ is the weight connecting neuron $k$ in layer $(l-1)$ to neuron $j$ in layer $l$, and $b^{(l)}_j$ is the bias. This calculation propagates from input to output.

Learning is governed by backpropagation, which uses the chain rule to compute gradients of a loss function $L$ (e.g., MSE) with respect to every weight and bias. The key gradient for a weight is:

$$\frac{\partial L}{\partial w^{(l)}_{jk}}= \delta^{(l)}_j \cdot a^{(l-1)}_k$$

Where $\delta^{(l)}_j = \frac{\partial L}{\partial z^{(l)}_j}$ is the "error" term for neuron $j$ in layer $l$. This error is propagated backwards from the output layer. The weight is then updated as $w \leftarrow w - \eta \frac{\partial L}{\partial w}$, where $\eta$ is the learning rate you control in the simulator.

Real-World Applications

Surrogate Models in CAE: Running high-fidelity Finite Element Analysis (FEA) or Computational Fluid Dynamics (CFD) simulations can take days. A neural network can be trained on a dataset of these simulation results to create a "surrogate" model that predicts outcomes in milliseconds, enabling rapid design exploration and optimization.

Physics-Informed Neural Networks (PINNs): This is a cutting-edge CAE application. Instead of just learning from data, PINNs embed the governing physical equations (like Navier-Stokes or heat equations) directly into the loss function. This guides the network to learn solutions that are physically plausible, even with sparse data, and can solve inverse problems.

Autonomous System Control: Neural networks process sensor data (camera, LiDAR) from vehicles or robots to make real-time decisions like steering, braking, or path planning. They learn complex mappings from high-dimensional inputs to control outputs that are difficult to program with traditional logic.

Material Property Prediction: In materials science and engineering, networks predict properties like strength, thermal conductivity, or fatigue life based on microstructural images or composition data. This accelerates the discovery of new alloys, composites, and polymers for specific engineering applications.

Common Misconceptions and Points to Note

While experimenting with this tool, you might encounter a few easily misunderstood points. First, you might think "a larger learning rate η leads to faster learning." This is only half true. While increasing the value does enlarge the weight update steps, setting η to something like 0.5 or 1.0 can cause the loss curve to oscillate wildly, potentially preventing convergence to an optimal solution. This is akin to overshooting the valley floor, landing on the opposite slope, and overshooting again in a repeating cycle. In practice, the golden rule is to start with small values like 0.01 or 0.001 and adjust while monitoring progress.

Next is the misconception that "more hidden layers and nodes always improve performance." For this XOR problem, one hidden layer with two nodes is sufficient. But try deliberately creating a huge network in the tool, say 5 layers with 10 nodes each. You'll see the loss seemingly approach zero, but the decision boundary becomes overly complex, merely overfitting to the four training data points. This is overfitting. For real-world problems, aiming for a simple model robust to unseen data is crucial, and the depth and width of layers must be carefully decided as hyperparameters.

Finally, the idea that "the sigmoid function is a universal, all-purpose activation function." While historically significant, it has a major weakness in deep networks. Because the sigmoid's output is squeezed between 0 and 1, it's prone to the vanishing gradient problem, where gradients become increasingly smaller as they propagate backward through layers. If you feel learning slow down when deepening layers in this tool, that's a basic experience of vanishing gradients. This is one reason why functions like ReLU ($f(x)=max(0, x)$) have become mainstream in modern deep learning practice.

Related Engineering Fields

The concepts of "forward propagation" and "error backpropagation" you learned with this tool are fundamentally connected to the core engineering challenges of "modeling" and "parameter identification" found across various fields, beyond CAE.

For example, in Structural Health Monitoring (SHM), invisible damage (cracks, loose bolts) in bridges or structures is detected from vibration data collected by sensors. Here, normal condition data serves as the "input" and damage patterns as the "output" for training a neural network, formulating it as a "classification/regression problem" to estimate damage presence or location from new data. It's precisely like drawing the decision boundary that separated 0/1 in the XOR problem, but in a much higher-dimensional data space.

Furthermore, in the field of Materials Informatics, alloy composition ratios (e.g., Iron 90%, Chromium 8%, Carbon 2%) or heat treatment conditions are modeled as the "input," with resulting strength or corrosion resistance as the "output." The goal is to inversely search for the optimal material recipe from limited, high-cost data obtained through experiments or simulations. This process is essentially the same as adjusting weights and biases via "backpropagation" to approach a target output. Moreover, all engineering challenges involving unstructured data, such as image-based automated inspection and acoustic signal-based anomaly detection, are potential application areas for the multilayer perceptron concepts you touched on with this tool.

For Further Learning

Once you've developed an intuitive feel for how neural networks work with this tool, it's recommended to delve deeper from both theoretical and implementation perspectives. As a learning path, first solidify your foundations in linear algebra and calculus. The chain rule, central to backpropagation, becomes much clearer when calculated all at once with vectors and matrices ($ \boldsymbol{\delta}^{(l)} = ( (\boldsymbol{W}^{(l+1)})^T \boldsymbol{\delta}^{(l+1)}) \odot \sigma'(\boldsymbol{z}^{(l)})$). Familiarizing yourself with vector dot products, matrix multiplication and transposition, and element-wise products (the Hadamard product $\odot$) is the next key step.

Building on that, learning about activation functions other than "sigmoid" (like ReLU, tanh) and loss functions other than "squared error" (like cross-entropy) will help you understand why they are used. For instance, the softmax function used in the output layer for multi-class classification is like a multi-dimensional version of the sigmoid.

Ultimately, aim to understand advanced architectures like Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN). CNNs learn "filters (a type of weight)" to extract local features from images, directly applicable to analyzing CAE mesh data or field visualization images. The intuition of "stacking layers to transform features" cultivated with this tool should provide a solid foundation for understanding these more complex and powerful models.