Visualize the secant iteration x_{n+1}=x_n-f(x_n)*(x_n-x_{n-1})/(f(x_n)-f(x_{n-1})) on f(x)=x^3-2x-5. Watch secant lines, the log10 error curve, and tune the two initial points without ever needing a derivative.
Convergence criterion. epsilon (tolerance) = 1e^-n where n is the tolerance slider.
What is the Secant Method
It solves f(x)=0 by drawing the line (the "secant") through the previous two iterates and using its x-intercept as the next estimate. It is Newton-Raphson with the analytic derivative replaced by a finite difference of past evaluations — the simplest derivative-free root finder.
🙋
Newton-Raphson and the secant method look almost identical in formula. When should I use which?
🎓
They share the same DNA — the secant method just swaps Newton's f'(x_n) for the slope (f(x_n)-f(x_{n-1}))/(x_n-x_{n-1}) built from the last two evaluations. So you can use it when f'(x) is unavailable (interpolated lab data, complicated numerical integrals) or just expensive. Convergence is superlinear with order phi approximately 1.618 (the golden ratio) — slower than Newton's quadratic 2, but the secant method costs only one f-evaluation per iteration (Newton needs both f and f'), so per function call the two are nearly tied.
🙋
When I set x_0=1, x_1=3 it converges in seven iterations. But with x_0=-2, x_1=-1 it shoots off in a weird direction. Why?
🎓
Good catch. The polynomial f(x)=x^3-2x-5 has a single real root near x=2.0946. Starting from two negative points means f(x_0) and f(x_1) are both negative, so the secant slants steeply upward and intersects the x-axis far to the right — overshooting wildly. That is the price of locality. A safe rule of thumb is to bracket the root with f(x_0)*f(x_1)<0; even better, use Brent's method (scipy.optimize.brentq), which mixes bisection's safety with the secant method's speed.
🙋
Will tightening the tolerance to 1e-12 explode the iteration count? You said the secant method is slower than Newton.
🎓
Surprisingly cheap. Superlinear order phi approximately 1.618 is still very strong. Going from 1e-6 to 1e-12 typically adds only two or three more iterations (Newton would add one or two). In CAE, the multidimensional cousin of the secant method — quasi-Newton methods like Broyden, BFGS, L-BFGS — is exactly how Abaqus and ANSYS avoid reassembling the tangent stiffness matrix every step in nonlinear FEA.
Physical Model and Key Equations
The secant method takes Newton-Raphson, x_{n+1}=x_n-f(x_n)/f'(x_n), and approximates the derivative by a finite-difference slope from the last two iterates.
The golden ratio shows up naturally — that is the signature of superlinear convergence (faster than linear but slower than quadratic).
Real-World Applications
Quasi-Newton optimization (CAE): Multidimensional F(x)=0 or grad f(x)=0 problems make assembling the Jacobian or Hessian expensive. Broyden's method (root finding) and BFGS / L-BFGS (optimization) approximate J^-1 or H^-1 from past updates — the natural multidimensional generalization of the 1D secant method.
Brent's method (general-purpose root finder): scipy.optimize.brentq, MATLAB's fzero, and Boost's brent_find_minima all default to Brent's method, which combines bisection's guaranteed convergence with the speed of the secant method and inverse quadratic interpolation.
Modified Newton in nonlinear FEA: Abaqus and ANSYS often reuse the tangent stiffness K=dR/du across several load steps to amortize the cost of LU factorization. Internally, the equilibrium iteration then behaves like a secant-style update; BFGS-style corrections are also common in implicit nonlinear solvers.
Implied volatility in finance: Inverting the Black-Scholes formula for the volatility sigma that matches the market price avoids computing dC/dsigma (Vega) analytically by using the secant method. Trading systems run this loop thousands of times per second.
Common Misconceptions and Pitfalls
The most frequent myth is "the secant method is just a worse Newton-Raphson". Although its order is 1.618 versus Newton's 2, the effective rate per function evaluation is phi approximately 1.618 for the secant versus sqrt(2) approximately 1.414 for Newton (when f and f' are evaluated separately). If f' costs as much as f, the secant method is actually faster in wall-clock terms.
Second, "the two initial points must bracket the root" is wrong. The secant method does not require f(x_0)*f(x_1)<0. But without that bracket, it can easily overshoot and diverge, as you can see by setting x_0=-2, x_1=-1 in this simulator: both f-values are negative, the secant slants up, and the next iterate leaps far to the right. For safety, bracket the root with bisection first, then switch to the secant — the philosophy behind Brent's method.
Third, watch out for division by f(x_n)-f(x_{n-1}). When two consecutive function values nearly coincide, the denominator approaches zero and the update explodes. A robust implementation aborts the iteration (or restarts from fresh points) once |f(x_n)-f(x_{n-1})| falls below a small threshold, e.g. 1e-14. Quasi-Newton solvers in CAE include similar restart logic when their Jacobian approximations stale out.
Frequently Asked Questions
The secant method replaces Newton's analytic derivative f'(x_n) with a finite-difference quotient (f(x_n)-f(x_{n-1}))/(x_n-x_{n-1}) built from the last two iterates. No symbolic or numerical differentiation is required — only function evaluations. Convergence is superlinear with order phi approximately 1.618, slightly slower than Newton's quadratic 2, but per function call the methods are competitive because the secant method needs only one f evaluation per step.
Bisection requires a bracket f(a)*f(b)<0 and converges linearly (the error halves every step) — slow but unconditionally safe. The secant method needs two starting points but no bracket, and converges superlinearly (order phi approximately 1.618) — fast but without convergence guarantees. Brent's method combines both to get safety and speed; that is what scipy.optimize.brentq uses internally.
Yes. The multidimensional generalization is Broyden's method, which approximates the Jacobian J of F(x)=0 using rank-one updates from past iterates. For optimization (where grad f(x)=0), the analogous BFGS and L-BFGS schemes approximate the Hessian H. Both are workhorses in scipy.optimize.minimize and in deep-learning second-order optimizers.
When the absolute value of the denominator drops below a small threshold close to machine epsilon (about 2.2e-16), the standard practice is to halt the iteration and report failure. Alternatives include restarting from two fresh points or falling back to bisection. This simulator stops the iteration once |f(x_n)-f(x_{n-1})| is below 1e-14.