What does R-squared (coefficient of determination) mean?

R-squared measures what fraction of the variance in y is explained by the regression model. It ranges from 0 to 1: R-squared = 1 means a perfect fit; R-squared = 0 means the model explains nothing. In engineering and CAE validation, R-squared >= 0.99 is typically required for a model to be considered accurate.

How do I choose the polynomial degree?

Higher degree always increases R-squared on training data, but risks overfitting — the curve memorizes noise and predicts poorly on new data. Use information criteria (AIC/BIC) or cross-validation to pick the simplest degree that adequately captures the trend. For most physical datasets, degrees 2-3 are sufficient.

What do the red residual lines show?

Each red vertical line connects a data point to the fitted curve. Its length is the residual — the prediction error at that x value. Randomly scattered small residuals indicate a good fit. Systematic patterns (U-shape, funnel) suggest the model is misspecified or heteroscedastic.

How do I add and remove data points?

Click anywhere on the chart area to add a point at that location. Click close to an existing point to remove it. Use the preset buttons to load representative datasets: perfect linear, noisy linear, quadratic, exponential growth, or S-curve. Click Clear All to start fresh.

Regression Analysis & Curve Fitting

Regression Model

Preset Datasets

Statistics

Fitted Equation

Add at least 2 data points

Results

—

R² (fit quality)

—

RMSE

Data points

—

Model

Scatter Plot — Click to Add / Remove Points

Click: add point | Click near existing point: remove | Red lines = residuals

Theory & Key Formulas

Minimize sum of squared residuals:
S = sum(yi - yi_hat)^2
R2 = 1 - SS_res / SS_tot

What is Regression Analysis?

🙋

What exactly is regression analysis? I see the term "curve fitting" a lot.

🎓

Basically, it's a way to find the best mathematical relationship between variables. You have scattered data points, and you want to draw a line or curve that best summarizes their trend. In this simulator, you can click to add your own data points and instantly see the fitted curve.

🙋

Wait, really? How do we decide what "best" means? There are so many possible lines.

🎓

Great question! The most common method is "Least Squares." We define "best" as the line that minimizes the sum of the squared vertical distances from each point to the line. Those distances are called residuals. Try adding a few points and watch the red residual lines change as the model updates.

🙋

Okay, I see the red lines. But the simulator lets me choose different models like "Quadratic" or "Exponential." How do I know which one to use?

🎓

In practice, you use the data's shape and metrics like R². For instance, if your data points curve upwards, a straight line (Linear) will have large, patterned residuals. Switch to "Quadratic" and watch the R² value increase and the residuals become smaller and more random. That's a sign of a better fit.

Physical Model & Key Equations

The core of regression is minimizing the sum of squared residuals. The residual for a data point $(x_i, y_i)$ is the difference between the observed value $y_i$ and the value predicted by the model $\hat{y}_i$.

$$S = \sum_{i=1}^{n}(y_i - \hat{y}_i)^2$$

Here, $S$ is the sum of squared residuals we aim to minimize. $y_i$ is the actual data point value, and $\hat{y}_i$ is the value predicted by our fitted equation (e.g., $\hat{y}= mx + b$ for a linear model).

To quantify how well the model explains the variation in the data, we use the coefficient of determination, R-squared ($R^2$).

$$R^2 = 1 - \frac{SS_{res}}{SS_{tot}}= 1 - \frac{\sum (y_i - \hat{y}_i)^2}{\sum (y_i - \bar{y})^2}$$

$SS_{res}$ is the sum of squared residuals (the same as $S$ above). $SS_{tot}$ is the total sum of squares, which measures the total variance in the $y$ data around its mean $\bar{y}$. An $R^2$ close to 1 indicates the model explains most of the variance.

Real-World Applications

Predictive Maintenance in Engineering: Sensor data (like vibration, temperature) is collected from machinery over time. Regression models fit trends to this data, predicting when a measurement will cross a failure threshold, allowing maintenance before a breakdown occurs.

Financial Forecasting: Analysts use curve fitting on historical stock prices or economic indicators. While not perfectly predictive, identifying trends (linear growth, exponential decay) helps inform investment strategies and risk assessments.

Drug Dosage Response: In pharmacology, researchers test a drug at different doses and measure a biological response. An exponential or logistic regression model is often fitted to this data to determine the effective dose for 50% of the population (ED50).

CAE & Material Science: When simulating material behavior, stress-strain data from physical tests is fitted with a constitutive model (like a polynomial or power-law). This fitted equation is then programmed into the simulation software to predict how a new part will deform under load.

Common Misconceptions and Points of Caution

First, the assumption that a high R² always means a good model is dangerous. For example, fitting a 5th-order polynomial to material creep data might yield an R² above 0.99, but that curve may have no physical meaning and fail entirely to predict future behavior. In practice, the balance between predictive performance and interpretability is crucial. Next, the influence of outliers is often overlooked. If your experimental data has just one clearly distant point, the least squares method will be strongly pulled toward it, producing a regression formula that distorts the overall trend. Try it in NovaSolver: add a single point far away from a cluster of points lying in a straight line. You'll see the line shift significantly. Finally, try to avoid making predictions outside the data range (extrapolation). Using a formula derived from experimental data between 20°C and 80°C to predict behavior at 150°C is very risky, even with a high R². Unforeseen phenomena, like material phase transitions, could occur.

Regression Analysis & Curve Fitting

What is Regression Analysis?

Physical Model & Key Equations

Real-World Applications

Common Misconceptions and Points of Caution

Related Tools