Isotonic Regression Simulator Back
Nonparametric Regression

Isotonic Regression Simulator

An interactive tool for monotone nonparametric regression. Adjust sample size, noise level and constraint direction to see PAV (Pool Adjacent Violators) in action, with live read-outs of violation count, pooled block count and the MSE ratio against OLS.

Parameters
Sample size N
Total number of observations (x_i, y_i)
True slope β
Slope of the true relation y = β·x
Noise std σ
Additive Gaussian noise ε ~ N(0, σ²)
Monotonicity
Ordering constraint for f
Loss function
How residuals are scored
Smoothing bandwidth
0 = pure step function, >0 averages neighbours
Results
True slope β
Violation count
Pooled blocks
Avg. block size
MSE ratio vs OLS
Monotonicity
Scatter + isotonic step function (PAV pools highlighted)

Blue dots: observations (x_i, y_i); red dashed line: OLS fit; green step: isotonic regression output. Pooled intervals are tinted in cyan.

Observations vs isotonic regression
MSE ratio (Iso / OLS) vs sample size N
Theory & Key Formulas

$$\hat f = \arg\min_{f\,\uparrow} \sum_{i=1}^{N} (y_i - f(x_i))^2$$

f↑ denotes a monotone increasing (or decreasing) constraint. PAV (Pool Adjacent Violators) delivers the exact optimum in O(N) time.

$$\hat f_{\text{block}}(x) = \frac{1}{|B|}\sum_{i\in B} y_i, \quad B\text{: maximal pool preserving monotonicity}$$

Within each pooled block B the prediction is the block mean. Adjacent blocks that break the order are merged and re-averaged repeatedly.

$$\mathrm{MSE}_{\text{iso}} \sim \frac{\sigma^2}{N^{2/3}}, \qquad \mathrm{MSE}_{\text{OLS}} \sim \frac{\sigma^2}{N}$$

Isotonic regression converges at the slower rate N^(-2/3) compared with N^(-1) for a correctly-specified linear model — but with zero bias when the linear form is wrong.

Isotonic Regression — Monotone-constrained Nonparametric Regression

🙋
I have never heard of "isotonic regression". How is it different from ordinary linear regression?
🎓
In plain words, it commits to no functional form — no straight line, no polynomial — and only assumes that y is monotone in x. Linear regression freezes the form to y = ax + b. Isotonic regression only assumes "y goes up as x goes up". So whether the true curve is a saturation, an S-curve or whatever, as long as it is monotone, isotonic regression can fit it. The catch is that the output is a step function.
🙋
A step function? Is that where the PAV algorithm comes in?
🎓
Exactly — PAV stands for Pool Adjacent Violators. Walk left to right, and whenever two neighbours violate monotonicity, "pool" them and replace both with their mean. If the new pooled block then falls below its left neighbour, merge again, and so on. You end up with several blocks, each with a constant predicted value, hence the staircase. With L2 loss it gives the exact optimum in O(N) — pretty incredible.
🙋
When I drag the noise σ up, the violation count and block count change a lot. What am I actually watching?
🎓
Higher noise means more order reversals between neighbours — that is the violation count. PAV resolves each violation by merging blocks, so more violations means fewer pooled blocks and a larger average block size. In the limit, noise overwhelms the signal β and the whole dataset collapses into one block, i.e. a constant function. That is why we flag a warning when the MSE ratio exceeds ten.
🙋
Doesn't a larger MSE than OLS mean isotonic regression is weaker?
🎓
Only when the true relation really is linear. Then OLS converges at rate N^(-1) and isotonic at the slower N^(-2/3). But if the true curve is an S-shape or a saturation, OLS keeps a persistent bias forever. Isotonic is slower but has zero bias. With small samples OLS looks deceptively good, but as N grows it is isotonic regression whose true MSE keeps falling.
🙋
Where is it used in practice?
🎓
The classic use is probability calibration in machine learning. When a classifier's score of 0.8 is not really an 80% probability, you correct it with Platt scaling or isotonic regression. In sklearn it is one line: IsotonicRegression. Other uses include drug dose-response curves, structural degradation indices, the Nelson-Aalen estimator in survival analysis, and psychometric functions. Anywhere the relationship is monotone but the shape is unknown, isotonic regression is the first candidate.

Frequently asked questions

Isotonic regression is a nonparametric regression method that only assumes monotonicity (increasing or decreasing). Unlike linear regression, it does not commit to a functional form (such as a line or polynomial). Instead it finds the closest step function to the observations y_i such that the values f(x_i) are monotone. The standard algorithm is PAV (Pool Adjacent Violators), which runs in O(N) time. It is powerful when you trust that the relationship is monotone but you do not want to assume a specific shape.
After sorting the data by x, scan left to right. Whenever an adjacent pair violates monotonicity, pool the two and replace them by their mean. If the pooled block then falls below its left neighbour, merge them as well and re-average. Repeat until every adjacent pair is monotone. The block means are the isotonic regression fit. For L2 loss this gives the exact optimum; for L1 it generalises to medians, and for Huber loss to a robust variant.
OLS commits to a straight line, so it carries a large bias whenever the true relationship is monotone but curved (saturation curves, S-shapes). Isotonic regression makes no shape assumption, so there is no shape mismatch bias. The flip side is that the output is piecewise-constant; if you need smoothness, apply a smoother (e.g. a spline) on top of the isotonic fit as a post-processing step. As N grows, the MSE ratio of isotonic regression vs OLS shrinks.
The most common application is probability calibration in machine learning. When converting a classifier score into a true probability, isotonic regression is widely used as an alternative to Platt scaling (logistic fitting), e.g. sklearn's IsotonicRegression. Other uses include drug dose-response curves, psychometric functions, structural degradation vs life, and Nelson-Aalen estimators in survival analysis — anywhere the relationship is monotone but the shape is unknown.

Real-world applications

Probability calibration in machine learning: classifier scores produced by SVMs, random forests or boosted trees are not always faithful probabilities. A score of 0.8 may correspond to a true positive rate of 60% or 90%. Isotonic regression learns a monotone "score → probability" map from a held-out validation set. Scikit-learn's IsotonicRegression and CalibratedClassifierCV(method='isotonic') are widely used, and the method is more flexible in shape than Platt scaling (logistic fitting).

Pharmacology dose-response curves: in pharmaceutical statistics, the response rate is expected to be monotone in dose. Sampling noise nevertheless produces small reversals. Isotonic regression injects the biologically meaningful "monotone increasing" prior directly into the model, yielding a smooth response curve without committing to a Hill-type parametric form. FDA guidance on dose-response analysis mentions the method as an alternative.

Structural degradation and remaining life: bridges, pipelines and rotating machinery are inspected via strain gauges or vibration features whose cumulative metrics map to remaining service life. Degradation is fundamentally monotone, so isotonic regression is a natural way to extract the monotone "index → life" mapping from noisy time-series, even when complex fatigue behaviour invalidates classical SN curves.

Survival analysis and reliability engineering: the cumulative hazard from Nelson-Aalen or Kaplan-Meier estimators is monotone increasing in principle, but small-sample estimates fluctuate. Smoothing with isotonic regression preserves monotonicity while removing noise. It also fits Weibull parameter estimation in reliability and survival curve presentation in medical statistics.

Common misconceptions and pitfalls

The biggest pitfall is the assumption that "isotonic regression is always better than OLS." When the true relationship really is linear, OLS converges at N^(-1) whereas isotonic only achieves N^(-2/3). In this simulator, with large β and small σ, the MSE ratio climbs to 5–10. Use OLS when a line is appropriate; isotonic regression is for cases where monotonicity is trustworthy but the shape is unknown. The practical rule is: inspect the scatter plot for linearity first, and reach for isotonic regression only when the form is unclear.

Next, "using the step function as the final predictor" can be awkward in some applications. The isotonic output is piecewise-constant, so predictions for new x land on the staircase and jump at the block boundaries. If you need continuous predictions, post-process the isotonic fit with a spline or kernel smoother. The "smoothing bandwidth" slider in this tool emulates exactly that: at 0 the output is the pure staircase; above 0 it averages neighbouring blocks for smoothness.

Finally, "never use isotonic regression for extrapolation." The fit is only defined on the range of observed x. For values below the minimum or above the maximum x, the prediction is clamped to the end-block value; there is no linear extrapolation. This is the price you pay for not assuming a shape: outside the data, there is no information to lean on. When extrapolation matters, combine isotonic regression with a parametric form (linear, exponential, Hill) or restrict the working range from the outset.

How to Use

  1. Set sample size (numSamplesIR) between 20–500 observations to control dataset cardinality for PAV algorithm convergence.
  2. Adjust true slope (trueSlopeIR) in range –2 to +2 to define monotone ground truth; noise standard deviation (noiseStdIR) introduces heteroscedastic perturbations.
  3. Configure smoothing bandwidth to control local averaging; observe pooled blocks, violation count, and MSE ratio vs OLS as the isotonic regression reshapes the response curve.

Worked Example

With 150 samples, true slope β=0.8, noise stddev=0.35, and monotone increasing constraint: the PAV algorithm produces ~12 pooled blocks with avg block size 12.5 observations. MSE ratio (isotonic vs OLS) yields 0.92, indicating 8% efficiency gain. Violation count remains 0, confirming monotonicity preservation. Bandwidth=0.15 smooths localized fluctuations without destroying monotone structure.

Practical Notes

  1. Higher noise (stddev >0.5) increases pooled blocks and violation counts; reduce bandwidth or increase samples to stabilize monotone fit in dose–response or quality-control applications.
  2. True slope near zero (–0.1 to +0.1) challenges isotonic regression; MSE ratio may exceed 1.0 if noise dominates signal, favoring unconstrained OLS.
  3. Block count typically scales as O(√n); inspect avg block size to diagnose overfitting or underfitting in monotone regression for reliability engineering datasets.