How does this differ from a kernel SVM?

The linear SVM in this tool separates two classes with a straight line (or hyperplane) directly in the input space. A kernel SVM implicitly maps the input to a higher-dimensional space using an RBF or polynomial kernel and can classify data that is not linearly separable with a curved boundary. It is heavier to implement and has more hyperparameters, but is effective for images and nonlinear patterns.

What is the difference between hard-margin and soft-margin SVM?

Hard-margin SVM requires every training point to lie strictly outside the margin and only applies to linearly separable data. Soft-margin SVM uses a hinge loss to allow some margin violations, so it stays robust under noise or slight overlap. The regularization parameter C controls how strictly violations are penalized: a smaller C gives a wider margin, a larger C punishes misclassifications more strongly.

What is the dual problem and why does it matter?

The primal problem optimizes w and b directly, but a Lagrangian dual reformulation turns it into an optimization of coefficients α_i, one per data point. The dual has two key advantages. First, the optimal w is a sum of α_i·y_i·x_i, so only points with α_i > 0 (the support vectors) contribute. Second, only inner products x_i·x_j appear, so replacing them with a kernel function K(x_i, x_j) extends SVM to nonlinear classification.

How is LIBSVM or scikit-learn used in practice?

LIBSVM is the de facto SVM library since the 1990s and solves the dual problem efficiently with the SMO (Sequential Minimal Optimization) algorithm. In Python, sklearn.svm.SVC wraps LIBSVM. In practice you standardize features, grid-search over C and γ (the RBF kernel width), and select the model by cross-validation. The subgradient method in this tool is for teaching; production large-scale data uses dedicated solvers.

Linear SVM Simulator — Free Online Calculator

Parameters

Regularization C

—

Learning rate η

—

Iterations

steps

Data noise level

Dataset is generated by a fixed seed-42 LCG (deterministic). Class +1 centred at (2,2), -1 at (-2,-2), 20 points each, σ=1.0.

While paused, move the sliders to update the result instantly.

Results

—

Training accuracy

—

Margin width 2/‖w‖

—

Support vectors (in/on margin)

—

‖w‖_2

2D Classification and Maximum-Margin Hyperplane

Blue dots = class +1 / Red crosses = class -1 / Green solid = w·x+b=0 / Green dashed = margin boundaries w·x+b=±1 / Black circles = support vectors

Theory & Key Formulas

A soft-margin linear SVM learns the maximum-margin separating hyperplane by minimizing the sum of a hinge loss and an L2 regularizer.

Decision function. w is the weight vector and b the bias:

$$f(\mathbf{x}) = \mathbf{w}\cdot\mathbf{x} + b$$

Objective. The first term maximizes the margin, the second is the hinge loss with regularization parameter C:

$$J(\mathbf{w},b) = \tfrac{1}{2}\|\mathbf{w}\|^2 + C\sum_{i=1}^{N}\max\bigl(0,\,1 - y_i\,f(\mathbf{x}_i)\bigr)$$

Subgradient (only the violating points $1 - y_i f(\mathbf{x}_i) \gt 0$ contribute):

$$\frac{\partial J}{\partial \mathbf{w}} = \mathbf{w} - C\sum_{i\in\mathcal{V}} y_i\,\mathbf{x}_i,\quad \frac{\partial J}{\partial b} = -C\sum_{i\in\mathcal{V}} y_i$$

The margin width is $2/\|\mathbf{w}\|$. Support vectors are the points with $|y_i f(\mathbf{x}_i) - 1|$ small (on or inside the margin).

What is the Linear SVM Simulator

🙋

I have heard of SVM, but what is it actually doing?

🎓

Roughly, it is an algorithm that draws a line (or a hyperplane in higher dimensions) that splits two classes "with as much breathing room as possible". In the simulator above it is learning the green line that separates the blue dots from the red crosses. The key point is not just to split them, but to choose the line where the distance to the closest point on each side — the margin — is largest. That is why it is called a "maximum-margin classifier".

🙋

There are two green dashed lines too. What are those?

🎓

Those are the margin boundaries, $w\cdot x + b = \pm 1$, and the width of that band, $2/\|w\|$, is the margin width. SVM tries to widen this band as much as possible while still classifying the training points correctly. Push the regularization C from 1 up to 100 — as C grows, the penalty on misclassifications becomes heavier and the band shrinks to fit individual points more aggressively.

🙋

The "support vectors" card shows a number. Are those the points circled in black?

🎓

Exactly. Points sitting on or inside the margin boundaries are the support vectors. The fascinating part of SVM is that the final decision boundary is determined entirely by these support vectors. "Easy" points far outside the margin can be added in any number without changing the boundary at all. That is why they are named "support vectors" — they support the decision.

🙋

When I move the noise slider the blue and red start to overlap, and the margin gets wider. That feels backwards.

🎓

Good observation. As the overlap grows, many points end up inside the margin. The soft-margin SVM "tolerates" this through the hinge loss, with C controlling how heavy the penalty is. Small C means "keep the margin wide and accept some violations"; large C means "punish violations hard and shrink the margin to classify correctly". In practice C is tuned by cross-validation.

Frequently Asked Questions

Each subgradient step takes a bigger jump. Too small and convergence is slow; too large and w and b oscillate or diverge. Around 0.01 is a stable choice for this tool. Pushing η to 0.5, especially with a large C, makes the objective oscillate noticeably and can transiently drop the training accuracy. It is a useful exercise in numerical stability of optimization.

Three main causes. First, large noise that makes the data linearly inseparable. Second, C is too small, so the misclassification penalty is light and the model deliberately tolerates a few errors. Third, the iteration count is too low, so optimization has not converged. With noise=0, C=1, η=0.01, and 500 iterations the default dataset reaches 100%.

Three options. First, add features so the data becomes linearly separable in a richer space (e.g., add x1², x1·x2 polynomial features). Second, use a kernel SVM (RBF, polynomial, sigmoid). Third, accept some errors with a soft margin, prioritizing robustness. With real data the second and third options are usually combined.

Both are linear classifiers but use different losses. SVM uses hinge loss, which is exactly zero for correctly classified points outside the margin. Logistic regression uses cross-entropy and always has a non-zero gradient on every point. As a result, SVM gives a sparse solution determined only by the support vectors, while logistic regression gives a smooth solution with probability outputs. Choose logistic regression when you need probabilities, SVM when you want margin-based discrimination.

Real-World Applications

Text classification and spam filtering: SVM was the standard text classifier in the 2000s. Documents are encoded as word-frequency vectors (TF-IDF), and a linear SVM performs spam detection or topic classification. SVM's strength on high-dimensional sparse vectors fits text data, where the vocabulary easily reaches tens of thousands of dimensions. It still serves as a strong baseline for simple text classification.

Image classification and bioinformatics: HOG features + linear SVM for human detection (Dalal & Triggs, 2005) was the de facto image-recognition method before deep learning. In bioinformatics, kernel SVMs are widely used for gene-expression and protein-structure classification. SVM excels at the "small n, large p" regime — few samples but many features.

Anomaly detection (One-Class SVM): One-Class SVM learns the boundary of "normal" from normal data alone and flags outliers. It is used in defect detection on production lines, sensor anomaly detection, and network-intrusion detection. It is robust to extreme class imbalance and does not need labelled abnormal samples — major practical advantages.

Education and theory: SVM packages many central machine-learning concepts in one place — margin maximization, dual formulation, kernel methods, convex optimization, structural risk minimization. Even in the deep-learning era, it is treated as a required topic in universities and industry training as a foundational "shape" of modern machine learning.

Common Misconceptions and Cautions

The most common misconception is to think that "the larger C is, the better the performance". C is the weight on the hinge loss; raising it tightens the fit to the training data but also raises the risk of overfitting. Lowering C widens the margin, which improves generalization while reducing training accuracy. Sweep the simulator's C from 0.01 to 100 and you will see the margin width change continuously. In practice you grid-search C with cross-validation. The right C is dataset-specific — there is no universal value.

The next most common error is to assume "subgradient descent is the canonical SVM solver". This tool implements it for educational simplicity, but production SVM solvers attack the dual problem with SMO (Sequential Minimal Optimization) — used by LIBSVM, scikit-learn, Vowpal Wabbit, etc. The subgradient method appears in large-scale online learning (Pegasos), but is inferior to dual methods in convergence speed and numerical stability. This tool is a device to feel the relationship between hinge loss and margin maximization.

Finally, beware of the misconception that SVM is "scale-insensitive". Linear SVM is in fact very sensitive to feature scaling. Feed it data with x1 in [0,1] and x2 in [0,10000] and the large scale of x2 makes w2 tiny, so the contribution of x1 is essentially ignored. This tool uses 2-D data on a common scale so the issue does not appear, but with real data you must always standardize (StandardScaler) or normalize. Forget this and tuning C becomes meaningless.

Linear SVM Simulator — Soft-Margin 2D Classification

What is the Linear SVM Simulator

Frequently Asked Questions

Real-World Applications

Common Misconceptions and Cautions

How to Use

Worked Example

Practical Notes

Linear SVM Simulator — Soft-Margin 2D Classification

What is the Linear SVM Simulator

Frequently Asked Questions

Real-World Applications

Common Misconceptions and Cautions

Related Tools

How to Use

Worked Example

Practical Notes