Visualize neural network structure, forward propagation, and backpropagation in real time. Learn how neural nets work by training on the XOR problem.
Network Architecture
Input nodes
Hidden layers
Nodes per hidden layer
Output nodes
Activation Function
Applied to the hidden layers. The output layer is always Sigmoid (binary XOR classification).
Learning rate (η)
Training Controls
Results
0
Epochs
—
Loss
—
Accuracy
Nn
Positive weight
Negative weight
High activation
Low activation
Loss Curve
Loss
Decision Boundary (XOR)
Bound
Theory & Key Formulas
Backpropagation computes the gradient of loss L w.r.t. weight w via the chain rule:
∂L/∂w = ∂L/∂a · ∂a/∂z · ∂z/∂w
XOR is linearly non-separable and can only be solved by a network with at least one hidden layer.
What is a Neural Network?
🙋
What exactly is a neural network trying to do? I see it's solving the XOR problem here, but what's the big picture?
🎓
Basically, it's a function approximator. It learns to map inputs (like the two numbers for XOR) to the correct outputs (0 or 1). The "learning" happens by adjusting thousands of internal knobs—the weights and biases. Try changing the "Hidden layers" slider above. Adding more layers lets it learn more complex patterns, but also makes training trickier.
🙋
Wait, really? So the network just starts with random guesses? How does it know which way to adjust those "knobs"?
🎓
Exactly! It starts randomly, which is why the initial output is often wrong. It knows how to adjust via backpropagation. The network makes a prediction (forward pass), compares it to the truth using a loss function (like Mean Squared Error), and then calculates how much each weight contributed to the error, working backwards. That's the "Learning rate (η)" parameter—it controls how big a step it takes when adjusting weights based on that error.
🙋
So backpropagation is just the network learning from its mistakes? Why is the XOR problem such a classic example for this?
🎓
In practice, yes! XOR is famous because a single neuron (perceptron) cannot solve it—it's not linearly separable. You need at least one hidden layer to create a combination of decision boundaries. Watch the simulator: with zero hidden layers, it will never learn. Add one hidden layer with a few nodes, and it can find a solution. This demonstrates the essential need for depth in neural networks.
Physical Model & Key Equations
The core of a neuron's operation is the weighted sum of its inputs, passed through a non-linear activation function (like sigmoid or ReLU). This is the forward propagation step for a single neuron:
Here, $a^{(l)}_j$ is the activation of neuron $j$ in layer $l$, $\sigma$ is the activation function, $w^{(l)}_{jk}$ is the weight connecting neuron $k$ in layer $(l-1)$ to neuron $j$ in layer $l$, and $b^{(l)}_j$ is the bias. This calculation propagates from input to output.
Learning is governed by backpropagation, which uses the chain rule to compute gradients of a loss function $L$ (e.g., MSE) with respect to every weight and bias. The key gradient for a weight is:
Where $\delta^{(l)}_j = \frac{\partial L}{\partial z^{(l)}_j}$ is the "error" term for neuron $j$ in layer $l$. This error is propagated backwards from the output layer. The weight is then updated as $w \leftarrow w - \eta \frac{\partial L}{\partial w}$, where $\eta$ is the learning rate you control in the simulator.
Frequently Asked Questions
Simply click the 'Start Training' button on the screen, and the learning will begin automatically. As training progresses, the weights and biases of the neural network are updated, and you can observe in real time how the output layer values approach the correct XOR results (0,0→0, 0,1→1, 1,0→1, 1,1→0).
You can switch by clicking the 'Forward Propagation' and 'Backpropagation' tabs at the top of the screen. In the forward propagation tab, the flow of signals from input to output is displayed, while in the backpropagation tab, the propagation of errors from output to input is shown with color coding.
Yes. The formulas are for reference and are not essential for operation. The color intensity of each neuron represents the strength of activation, and the thickness of the lines represents the magnitude of weights, allowing for intuitive visual understanding. Detailed explanations of the formulas can be found by clicking the 'Explanation' button at the bottom of the tool.
Click the 'Reset' button on the screen to initialize the weights and biases, then start training again. Additionally, adjusting the learning rate slider (typically recommended between 0.1 and 0.5) may improve convergence. If it still does not converge, try changing the number of layers or nodes in the network.
Real-World Applications
Surrogate Models in CAE: Running high-fidelity Finite Element Analysis (FEA) or Computational Fluid Dynamics (CFD) simulations can take days. A neural network can be trained on a dataset of these simulation results to create a "surrogate" model that predicts outcomes in milliseconds, enabling rapid design exploration and optimization.
Physics-Informed Neural Networks (PINNs): This is a cutting-edge CAE application. Instead of just learning from data, PINNs embed the governing physical equations (like Navier-Stokes or heat equations) directly into the loss function. This guides the network to learn solutions that are physically plausible, even with sparse data, and can solve inverse problems.
Autonomous System Control: Neural networks process sensor data (camera, LiDAR) from vehicles or robots to make real-time decisions like steering, braking, or path planning. They learn complex mappings from high-dimensional inputs to control outputs that are difficult to program with traditional logic.
Material Property Prediction: In materials science and engineering, networks predict properties like strength, thermal conductivity, or fatigue life based on microstructural images or composition data. This accelerates the discovery of new alloys, composites, and polymers for specific engineering applications.
Common Misconceptions and Points to Note
While experimenting with this tool, you might encounter a few easily misunderstood points. First, you might think "a larger learning rate η leads to faster learning." This is only half true. While increasing the value does enlarge the weight update steps, setting η to something like 0.5 or 1.0 can cause the loss curve to oscillate wildly, potentially preventing convergence to an optimal solution. This is akin to overshooting the valley floor, landing on the opposite slope, and overshooting again in a repeating cycle. In practice, the golden rule is to start with small values like 0.01 or 0.001 and adjust while monitoring progress.
Next is the misconception that "more hidden layers and nodes always improve performance." For this XOR problem, one hidden layer with two nodes is sufficient. But try deliberately creating a huge network in the tool, say 5 layers with 10 nodes each. You'll see the loss seemingly approach zero, but the decision boundary becomes overly complex, merely overfitting to the four training data points. This is overfitting. For real-world problems, aiming for a simple model robust to unseen data is crucial, and the depth and width of layers must be carefully decided as hyperparameters.
Finally, the idea that "the sigmoid function is a universal, all-purpose activation function." While historically significant, it has a major weakness in deep networks. Because the sigmoid's output is squeezed between 0 and 1, it's prone to the vanishing gradient problem, where gradients become increasingly smaller as they propagate backward through layers. If you feel learning slow down when deepening layers in this tool, that's a basic experience of vanishing gradients. This is one reason why functions like ReLU ($f(x)=max(0, x)$) have become mainstream in modern deep learning practice.
Set input layer nodes (typically 2 for XOR classification) using the inNodesNum slider, then configure hidden layer architecture with numHiddenNum (1-3 layers) and hidNodesNum (4-16 neurons per layer)
Define output nodes (1 for binary classification, 4 for multi-class) via outNodesNum, then click Initialize Network to create weight matrices
Load training data (XOR dataset: [0,0]→0, [0,1]→1, [1,0]→1, [1,1]→0), adjust learning rate 0.1-0.5, and run training; monitor Epochs counter, Loss curve trending toward 0.01, and Accuracy percentage reaching 95-100%
Worked Example
XOR problem with 2 input nodes, 1 hidden layer of 8 neurons, 1 output node: Initial random weights W1 (2×8 matrix) and W2 (8×1 matrix). After forward propagation on input [1,0], hidden activation = sigmoid(W1·[1,0]+b1) produces intermediate values ~0.45-0.65. Backpropagation calculates gradients, updates weights over 500 epochs. Final Loss = 0.009, Accuracy = 100%, output neuron fires 0.98 for [1,0] input and 0.02 for [0,0], correctly solving XOR classification.
Practical Notes
XOR requires at least 8-10 hidden neurons; fewer nodes fail to learn nonlinear separation even after 1000 epochs
Learning rate 0.5+ causes oscillating Loss that never converges; 0.1-0.2 stabilizes training for most architectures
Validate on separate test set (20% of 4 XOR samples) to detect overfitting; if training Accuracy=100% but test Accuracy=75%, reduce hidden layer size
Batch size affects gradient updates: smaller batches (size 1) add noise, useful for escaping local minima; larger batches (4 samples) provide stable convergence