Experience the "convolution" that filters an image with a 3x3 kernel. Switch the kernel type or the image pattern and the input image, the filtered output and the scan-line profile all update in real time, giving you an intuitive feel for the operation at the heart of every convolutional neural network (CNN).
Parameters
Kernel type
Pick a 3x3 filter; its effect appears in the preview
Standard deviation of Gaussian noise added to the input
Results
—
Kernel sum
—
Output mean intensity
—
Output std. deviation
—
Edge energy
—
Operation count
—
Kernel effect
—
Convolution scan animation (input → output)
The input image is on the left, the output on the right. The yellow box is the 3x3 kernel window; it slides over the image one pixel at a time, computing each output pixel in turn.
Effective kernel coefficients K_eff (the 9 of the 3x3)
The 2-D discrete convolution. The output pixel O(i,j) is the sum of each kernel coefficient K multiplied by the 3x3 neighbourhood of the input image I directly beneath it.
The effective kernel K_eff is a linear blend of the identity kernel and the chosen kernel K by the strength s. s=0 gives the identity (no change), s=1 gives the full kernel effect.
$$\sum K=1\;\Rightarrow\;\text{brightness preserved}, \qquad \sum K=0\;\Rightarrow\;\text{responds only to edges}$$
If the kernel coefficients sum to 1 the brightness of flat regions is preserved (smoothing / enhancement); if they sum to 0 the output responds only to changes in intensity, i.e. edges (differentiation / feature extraction).
What is Image Convolution?
🙋
When my phone's photo app has a "blur" or "sharpen" button, what is it actually doing inside?
🎓
Almost all of it is one operation called "convolution". The idea is simple: take a small table of numbers, maybe 3x3 (we call it a kernel or filter), lay it exactly over the image, and "multiply each number by the pixel brightness directly beneath it and add up all 9". That sum becomes the value of that pixel in the output image. You just repeat that across the whole image, sliding the kernel one cell at a time.
🙋
Wait — just nine multiplications and additions can do both "blur" and "sharpen"?
🎓
Yes, you only change what is inside the kernel. Set all nine to 1/9 and every pixel becomes "the average of itself and its 8 neighbours" — that is box blur. Make the centre 5 and the top/bottom/left/right −1, and a pixel brighter than its surroundings gets pushed even brighter — that is sharpening. Switch the "Kernel type" on the left and watch how the bars (kernel coefficients) above and the output image on the right change.
🙋
When I pick edge detection, the output is mostly black with just the outline glowing. Why is that?
🎓
Nice observation. The edge-detection (Laplacian) kernel has coefficients that sum to zero. So in a flat region of constant brightness, "same value × sum of zero" cancels out exactly and the output is 0 — black. Only where the brightness changes sharply, at an outline, does something fail to cancel and a value remains. This is a "derivative filter", the exact opposite role of the blur kernels that sum to 1. Look at the "Central scan line" chart and you will see the output spike at edges even when the input is smooth.
🙋
If I raise the input noise and then apply a Gaussian blur, the output standard deviation drops a lot. What does that mean?
🎓
That is exactly noise removal. Noise is a component that varies randomly per pixel; averaging the neighbourhood lets positive and negative parts cancel and shrink. The standard deviation measures the size of that scatter, so the more you blur the lower it gets. But at the same time the real edges get smeared too. "Remove the noise but keep the edges" is the eternal dilemma of image processing, and the spread of the Gaussian (the strength) is the knob that balances it.
🙋
Is this the same thing as the "convolutional neural networks (CNNs)" everyone talks about now?
🎓
The operation itself is completely identical. The difference is "who decides the kernel numbers". Here a human hand-writes the edge or blur kernel, but a CNN uses a huge amount of image data to automatically optimize the nine numbers of the kernel (often much larger) through learning. Moreover one layer holds dozens of kernels, and stacking layers lets the network capture progressively more complex features: edges, then textures, then eyes and wheels, then faces and cars. Convolution is the very foundation of modern image-recognition AI.
Frequently Asked Questions
Image convolution slides a small grid of numbers (the kernel, here 3x3) over an image one pixel at a time and, at every position, multiplies each kernel value by the neighbouring pixel directly underneath it and sums all of them. The output pixel is O(i,j)=SUM SUM K(m,n)·I(i+m,j+n). Just changing the numbers inside the kernel produces completely different effects such as blur, sharpening or edge detection.
The sum of the kernel coefficients sets the brightness of the output. A kernel summing to 1 (identity, box blur, Gaussian, sharpen) preserves the brightness of flat regions. A kernel summing to 0 (the Laplacian edge kernel or Sobel) produces zero output wherever the brightness is constant and responds only where the brightness changes — that is, at edges. The first group is used for smoothing or enhancement, the second for differentiation and feature extraction.
A convolutional layer in a CNN performs exactly the same sliding-kernel convolution as this tool. The difference is that the kernel values are not hand-designed (like an edge or blur kernel) — the CNN learns and optimizes them automatically from training data via backpropagation. A single layer holds dozens of kernels, and stacking layers lets the network capture features step by step: edges, then textures, then parts, then whole objects.
When a 3x3 kernel is centred on a pixel at the edge of the image, part of the kernel falls outside the image. This tool replicates (clamps) the border pixels to fill the outside. Other options are zero padding, mirror reflection and circular wrap. CNNs commonly use zero padding, and whether padding is applied changes the size of the output image.
Real-World Applications
Image-recognition AI (CNNs): Convolution is the basic operation behind nearly all modern vision AI — object detection, face recognition, medical-image diagnosis, the "eyes" of self-driving cars. The first layer of a CNN naturally learns kernels close to the edge and Sobel kernels in this tool, and later layers combine them to represent textures, parts and objects. Because the kernels are small and share weights, a CNN has dramatically fewer parameters than a fully connected network and trains far more efficiently.
Photo and video editing: The "sharpen", "blur", "denoise" and "outline enhance" features of smartphone camera apps and image editors are all applications of convolution kernels. Gaussian blur is used for background defocus and as a pre-step for mosaics; the unsharp mask (a form of sharpening) corrects slightly out-of-focus photos. For video, applying the same kernel to each frame achieves shake reduction or texture tuning.
Industrial inspection and computer vision: In factory visual inspection, Sobel or Laplacian kernels extract edges to detect scratches, chips and misalignment on products. Convolution is also a central tool in rule-based image processing: pre-processing for OCR (text recognition), feature-point extraction for robot vision, boundary extraction from satellite imagery. Even in CAE, it is used to extract contour lines from visualized result images.
Signal processing in science and engineering: Convolution applies to one-dimensional time-series signals with the same idea, not just images. Noise smoothing by moving average, change-point detection by a derivative filter, deconvolution to restore a blurred image — all are widely used to pre-process measurement data. Convolution is precisely the response of a linear time-invariant system, so it connects directly to the foundations of signal processing and control engineering.
Common Misconceptions and Pitfalls
A common one is confusing convolution with correlation. The mathematically strict convolution flips the kernel (rotates it 180°) before overlaying it, but image-processing and CNN implementations usually use the un-flipped "correlation" (cross-correlation). This tool also uses many symmetric kernels so the difference does not show, but for asymmetric kernels such as Sobel and emboss the sign and orientation flip. When reading papers or textbooks, check whether their "convolution" includes the flip or not.
Next, not paying attention to the sum of the kernel. Using a smoothing or enhancement kernel whose sum deviates from 1 makes the whole image brighter or darker. If you keep box blur as "all nine equal to 1" and forget to divide, the output becomes 9× and saturates to pure white. Conversely, the edge-detection kernel summing to 0 is meaningful — it reliably blacks out flat regions. When designing your own kernel, always first check that the coefficient sum is what you intend (1 or 0).
Finally, underestimating border handling and clipping. At the image edge the kernel hangs over, so the choice of zero padding, border replication or mirror reflection changes the result of the few edge pixels. Also, sharpening and edge detection can produce values below 0 or above 1, which must be clipped (clamped) to [0,1] before display. Forget the clip and you get crushed blacks, blown highlights or colour inversion; in a CNN, extreme values destabilize training. More than the convolution formula itself, this handling of "edges" and "ranges" tends to be the breeding ground for bugs.
How to Use
Load or generate a test image pattern (checkerboard, gradient, or noise) using gridNRange (3–256 pixels).
Select kernel type (Sobel, Gaussian, Laplacian, or custom) and adjust strengthRange (0.1–2.0) to scale filter coefficients.
Apply convolution by clicking Simulate; observe Kernel sum, Output mean intensity, Output std. deviation, Edge energy, and Operation count metrics.
Vary noiseLvRange (0–50%) to test kernel robustness against corrupted image data.
Worked Example
A 128×128 pixel grayscale checkerboard (alternating 0/255 intensity) convolved with Sobel kernel (strength=1.0) produces: Kernel sum=0, Output mean intensity≈128, Output std. deviation≈67, Edge energy≈12400 (high response at transitions). Adding 20% Gaussian noise reduces edge energy to 11200; operation count = 128×128×9 = 147,456 multiply-accumulate ops. Gaussian blur (strength=0.5) yields Output std. deviation≈22 and Edge energy≈800 (smoothing effect).
Practical Notes
Sobel kernels (strength 0.8–1.2) excel at boundary detection in manufacturing inspection; use high-contrast input images for crisp gradients.
Laplacian kernels amplify noise; reduce noiseLvRange below 10% or pre-filter with Gaussian (strength≈0.7) for real camera feeds.
Kernel sum=0 indicates high-pass behavior (edge-only output); non-zero sum produces low-pass or mixed responses—check Output mean intensity shift.
Operation count scales with gridN²; optimize stride or downsampling for embedded vision hardware targeting sub-50ms latency.