What is forward propagation in a neural network?

Forward propagation is the process of passing an input signal through the network from the input layer to the output layer. Each neuron multiplies the previous layer's outputs by weights, adds a bias, and applies an activation function. Click the 'Forward Pass' button to see this animated layer by layer.

Why is the XOR problem used to demonstrate neural network training?

XOR (exclusive OR) is linearly non-separable — no single straight line can divide the four data points into the two classes. A single-layer perceptron cannot solve it. Adding at least one hidden layer with backpropagation allows the network to learn the non-linear boundary, making XOR the canonical proof that multi-layer networks outperform linear classifiers.

How does backpropagation update the weights?

Backpropagation computes the gradient of the loss with respect to every weight using the chain rule, propagating error deltas from the output layer backward through each hidden layer. Each weight is then shifted in the direction that reduces the loss: w ← w − η·(∂L/∂w), where η is the learning rate. Repeating this over many epochs gradually converges the loss toward zero.

What are the differences between sigmoid, ReLU, and tanh?

Sigmoid outputs values in (0,1) and is intuitive for probabilities, but suffers from vanishing gradients in deep nets. ReLU passes positive inputs unchanged and outputs zero otherwise; it is the most common choice in modern deep learning because it mitigates vanishing gradients. tanh outputs values in (-1,1), has larger gradients than sigmoid, and is often used in recurrent networks. Switch between them in this tool to compare convergence speed on XOR.

Neural Network Visualizer — Forward & Backpropagation in Real Time

Network Architecture

Input nodes

Hidden layers

Nodes per hidden layer

Output nodes

Activation Function

Applied to the hidden layers. The output layer is always Sigmoid (binary XOR classification).

Learning rate (η)

Training Controls

Results

Epochs

—

Loss

—

Accuracy

Positive weight

Negative weight

High activation

Low activation

Loss Curve

Loss

Decision Boundary (XOR)

Bound

Theory & Key Formulas

Backpropagation computes the gradient of loss L w.r.t. weight w via the chain rule:

∂L/∂w = ∂L/∂a · ∂a/∂z · ∂z/∂w

XOR is linearly non-separable and can only be solved by a network with at least one hidden layer.

What is a Neural Network?

🙋

What exactly is a neural network trying to do? I see it's solving the XOR problem here, but what's the big picture?

🎓

Basically, it's a function approximator. It learns to map inputs (like the two numbers for XOR) to the correct outputs (0 or 1). The "learning" happens by adjusting thousands of internal knobs—the weights and biases. Try changing the "Hidden layers" slider above. Adding more layers lets it learn more complex patterns, but also makes training trickier.

🙋

Wait, really? So the network just starts with random guesses? How does it know which way to adjust those "knobs"?

🎓

Exactly! It starts randomly, which is why the initial output is often wrong. It knows how to adjust via backpropagation. The network makes a prediction (forward pass), compares it to the truth using a loss function (like Mean Squared Error), and then calculates how much each weight contributed to the error, working backwards. That's the "Learning rate (η)" parameter—it controls how big a step it takes when adjusting weights based on that error.

🙋

So backpropagation is just the network learning from its mistakes? Why is the XOR problem such a classic example for this?

🎓

In practice, yes! XOR is famous because a single neuron (perceptron) cannot solve it—it's not linearly separable. You need at least one hidden layer to create a combination of decision boundaries. Watch the simulator: with zero hidden layers, it will never learn. Add one hidden layer with a few nodes, and it can find a solution. This demonstrates the essential need for depth in neural networks.

Physical Model & Key Equations

The core of a neuron's operation is the weighted sum of its inputs, passed through a non-linear activation function (like sigmoid or ReLU). This is the forward propagation step for a single neuron:

$$a^{(l)}_j = \sigma\left( z^{(l)}_j \right) = \sigma\left( \sum_{k}w^{(l)}_{jk}a^{(l-1)}_k + b^{(l)}_j \right)$$

Here, $a^{(l)}_j$ is the activation of neuron $j$ in layer $l$, $\sigma$ is the activation function, $w^{(l)}_{jk}$ is the weight connecting neuron $k$ in layer $(l-1)$ to neuron $j$ in layer $l$, and $b^{(l)}_j$ is the bias. This calculation propagates from input to output.

Learning is governed by backpropagation, which uses the chain rule to compute gradients of a loss function $L$ (e.g., MSE) with respect to every weight and bias. The key gradient for a weight is:

$$\frac{\partial L}{\partial w^{(l)}_{jk}}= \delta^{(l)}_j \cdot a^{(l-1)}_k$$

Where $\delta^{(l)}_j = \frac{\partial L}{\partial z^{(l)}_j}$ is the "error" term for neuron $j$ in layer $l$. This error is propagated backwards from the output layer. The weight is then updated as $w \leftarrow w - \eta \frac{\partial L}{\partial w}$, where $\eta$ is the learning rate you control in the simulator.

Frequently Asked Questions

Simply click the 'Start Training' button on the screen, and the learning will begin automatically. As training progresses, the weights and biases of the neural network are updated, and you can observe in real time how the output layer values approach the correct XOR results (0,0→0, 0,1→1, 1,0→1, 1,1→0).

You can switch by clicking the 'Forward Propagation' and 'Backpropagation' tabs at the top of the screen. In the forward propagation tab, the flow of signals from input to output is displayed, while in the backpropagation tab, the propagation of errors from output to input is shown with color coding.

Yes. The formulas are for reference and are not essential for operation. The color intensity of each neuron represents the strength of activation, and the thickness of the lines represents the magnitude of weights, allowing for intuitive visual understanding. Detailed explanations of the formulas can be found by clicking the 'Explanation' button at the bottom of the tool.

Click the 'Reset' button on the screen to initialize the weights and biases, then start training again. Additionally, adjusting the learning rate slider (typically recommended between 0.1 and 0.5) may improve convergence. If it still does not converge, try changing the number of layers or nodes in the network.

Real-World Applications

Surrogate Models in CAE: Running high-fidelity Finite Element Analysis (FEA) or Computational Fluid Dynamics (CFD) simulations can take days. A neural network can be trained on a dataset of these simulation results to create a "surrogate" model that predicts outcomes in milliseconds, enabling rapid design exploration and optimization.

Physics-Informed Neural Networks (PINNs): This is a cutting-edge CAE application. Instead of just learning from data, PINNs embed the governing physical equations (like Navier-Stokes or heat equations) directly into the loss function. This guides the network to learn solutions that are physically plausible, even with sparse data, and can solve inverse problems.

Autonomous System Control: Neural networks process sensor data (camera, LiDAR) from vehicles or robots to make real-time decisions like steering, braking, or path planning. They learn complex mappings from high-dimensional inputs to control outputs that are difficult to program with traditional logic.

Material Property Prediction: In materials science and engineering, networks predict properties like strength, thermal conductivity, or fatigue life based on microstructural images or composition data. This accelerates the discovery of new alloys, composites, and polymers for specific engineering applications.

Common Misconceptions and Points to Note

While experimenting with this tool, you might encounter a few easily misunderstood points. First, you might think "a larger learning rate η leads to faster learning." This is only half true. While increasing the value does enlarge the weight update steps, setting η to something like 0.5 or 1.0 can cause the loss curve to oscillate wildly, potentially preventing convergence to an optimal solution. This is akin to overshooting the valley floor, landing on the opposite slope, and overshooting again in a repeating cycle. In practice, the golden rule is to start with small values like 0.01 or 0.001 and adjust while monitoring progress.

Next is the misconception that "more hidden layers and nodes always improve performance." For this XOR problem, one hidden layer with two nodes is sufficient. But try deliberately creating a huge network in the tool, say 5 layers with 10 nodes each. You'll see the loss seemingly approach zero, but the decision boundary becomes overly complex, merely overfitting to the four training data points. This is overfitting. For real-world problems, aiming for a simple model robust to unseen data is crucial, and the depth and width of layers must be carefully decided as hyperparameters.

Finally, the idea that "the sigmoid function is a universal, all-purpose activation function." While historically significant, it has a major weakness in deep networks. Because the sigmoid's output is squeezed between 0 and 1, it's prone to the vanishing gradient problem, where gradients become increasingly smaller as they propagate backward through layers. If you feel learning slow down when deepening layers in this tool, that's a basic experience of vanishing gradients. This is one reason why functions like ReLU ($f(x)=max(0, x)$) have become mainstream in modern deep learning practice.

Neural Network Visualizer

Loss Curve

Decision Boundary (XOR)

What is a Neural Network?

Physical Model & Key Equations

Frequently Asked Questions

Real-World Applications

Common Misconceptions and Points to Note

How to Use

Worked Example

Practical Notes

Neural Network Visualizer

Loss Curve

Decision Boundary (XOR)

What is a Neural Network?

Physical Model & Key Equations

Frequently Asked Questions

Real-World Applications

Common Misconceptions and Points to Note

Related Tools

How to Use

Worked Example

Practical Notes