Conjugate Gradient Method Simulator — CG vs Steepest Descent
Solve Ax=b for a symmetric positive definite matrix with the conjugate gradient method. Compare the log residual history with steepest descent and see why CG converges in n iterations.
Parameters
Max iterations k_max
iter
A[0][0] (diagonal)
—
A[1][1] (diagonal)
—
b[0] (right-hand side)
—
A is a 4x4 symmetric tridiagonal matrix. Fixed entries: A[1][2]=A[2][1]=1, A[2][3]=A[3][2]=1, A[0][1]=A[1][0]=1, A[2][2]=2, A[3][3]=1. Remaining b entries: b[1]=2, b[2]=3, b[3]=4. Initial guess x_0=0.
A =
b =
Results
—
Solution x_1
—
Residual ‖r‖_2
—
Iterations to converge k*
—
Condition number cond(A)
Residual norm history log10(‖r_k‖_2)
Blue solid line = CG / Red solid line = steepest descent / x-axis = iteration k / y-axis = log10(‖r_k‖_2)
Theory & Key Formulas
CG keeps the residual $r_k = b - A x_k$ and the search direction $p_k$ mutually A-orthogonal (conjugate) while updating.
Initialization: $r_0 = b - A x_0$, $p_0 = r_0$. Iteration step:
In exact arithmetic, an n-dimensional symmetric positive definite system is solved exactly in at most n iterations. The error decays at rate $\left(\frac{\sqrt{\kappa}-1}{\sqrt{\kappa}+1}\right)^k$ where $\kappa = \lambda_\max/\lambda_\min$ is the condition number.
What is the Conjugate Gradient Method Simulator
🙋
I keep seeing "solve Kd=f with CG" in my FEM textbook. What exactly is the CG method?
🎓
Roughly speaking, it is an iterative way to solve Ax=b when A is symmetric and positive definite. At each step it builds a residual $r_k = b - A x_k$ and a search direction $p_k$ that is A-orthogonal (conjugate) to all previous directions, then moves the optimal distance $\alpha_k$ along it. Slide the iteration count from 1 to 10 in the simulator and watch the blue curve drop sharply on the log scale.
🙋
There is a red curve next to it. What is that?
🎓
That is steepest descent — the naive method that always moves along the current residual. Push A[0][0] up to 10 and A[1][1] down to 1 to widen the condition number, and you will see steepest descent zig-zag along the valley and barely improve. CG never wastes effort on a direction it has already explored, so for this 4D problem it lands exactly on the true solution in just 4 iterations.
🙋
You're right! At iteration 4 the blue marker drops down to residual 1e-15 or so. Is that a coincidence?
🎓
Not at all — it is a theorem. For an $n$-dimensional symmetric positive definite system, CG reaches the exact solution in at most $n$ iterations in exact arithmetic, because the Krylov subspace $\mathrm{span}\{r_0, A r_0, A^2 r_0, \ldots\}$ saturates at dimension n and the true solution lives inside. In floating point you lose a little, but CG is still in another league compared to steepest descent.
🙋
The condition number card shows something around 11. What does that number mean?
🎓
The condition number is $\kappa = \lambda_\max / \lambda_\min$, the ratio of largest to smallest eigenvalue. A large $\kappa$ means the level sets form a long narrow valley that traps steepest descent. CG's error decays as $\left(\frac{\sqrt\kappa - 1}{\sqrt\kappa + 1}\right)^k$ — the square root matters: even for $\kappa$ of 100, CG drops about ten times faster than steepest descent. In practice we use "preconditioned CG (PCG)" to reduce the effective condition number even further.
Frequently Asked Questions
A must be symmetric (A^T = A) and positive definite (x^T A x > 0 for every nonzero x). In practice FEM stiffness matrices, graph Laplacians and covariance matrices satisfy this naturally. For nonsymmetric systems use Bi-CGSTAB or GMRES; for symmetric indefinite systems use MINRES — all relatives of CG built on the same Krylov ideas.
CG only needs the matrix-vector product matVec(A, v), so it preserves sparsity and never explicitly forms factors. Direct solvers suffer from fill-in and can exhaust memory for FEM problems with millions of unknowns, where PCG (preconditioned CG) is the de facto standard. On the other hand, when many right-hand sides share one A, reusing an LU factorization can outperform iterative methods.
M should approximate A while keeping M^{-1}v cheap. Jacobi (diagonal scaling) is the cheapest but limited; incomplete Cholesky (IC(0), ICT) is the workhorse for structural analysis. For fluids and complex physics, algebraic multigrid (AMG) is a powerful choice that achieves O(n) convergence independent of the condition number. Libraries such as Trilinos and PETSc ship many preconditioners ready to use.
The most common test is the relative residual ‖r_k‖_2 / ‖b‖_2 < tol with tol between 1e-6 and 1e-10. For ill-conditioned systems the true error ‖x_k - x_*‖ can be much larger than the residual, so estimates of the A-norm error or the number of significant digits are often used in addition. Always pair this with a maximum iteration cap (for example 2n or 500-2000) and revisit the preconditioner if convergence stalls.
Real-World Applications
Finite Element Analysis (FEA) for structures: Stress analysis in automotive, aerospace and civil engineering produces enormous stiffness equations Kd=f with K symmetric positive definite and very sparse. Problems with millions of degrees of freedom are routine. Preconditioned CG (IC-PCG, AMG-PCG) is the standard solver in commercial codes (Abaqus, Ansys, Nastran) and open source frameworks (FreeFEM, FEniCS) alike.
Discretization of partial differential equations: Discretizing heat conduction, Poisson and other elliptic PDEs by finite differences or finite volumes yields symmetric positive definite systems. Combining multigrid with CG delivers condition-number-independent O(n) scalability in geophysical and atmospheric simulation.
Machine learning and optimization: The quadratic minimization min (1/2)x^T A x - b^T x is equivalent to Ax=b, so CG is naturally an optimization algorithm. In "Hessian-free optimization" used in deep learning and reinforcement learning, CG solves the Newton step without materializing the Hessian.
Imaging and tomography: CT reconstruction and image restoration boil down to normal equations (A^T A) x = A^T b. Because A^T A is symmetric positive (semi-)definite, CG variants (CGLS, LSQR) are the standard fast, memory-light solvers in this domain.
Common Misconceptions and Caveats
The most common misconception is believing without reservation that CG always reaches the exact solution in n iterations. That theorem assumes exact arithmetic; in floating point, rounding errors gradually break conjugacy $p_i^\top A p_j = 0$ and the residual still has a tiny nonzero component after n steps. In practice we use a relative residual tolerance for stopping and apply restart or re-orthogonalization when needed. The simulator runs in 4D so the rounding noise is invisible, but for problems with millions of unknowns the behavior depends heavily on the condition number.
The next pitfall is giving up on CG when the condition number is large. Convergence does slow down for large $\kappa$, but the cure is to choose a good preconditioner M. In production almost nobody runs unpreconditioned CG — incomplete Cholesky (IC), symmetric Gauss-Seidel (SSOR) and algebraic multigrid (AMG) cut the effective condition number to $\sqrt{\kappa}$ or even $O(1)$. The problem is rarely "CG is slow", it is "the preconditioner is wrong". In the simulator, widening A[0][0]=10 and A[1][1]=1 already separates CG and steepest descent; a good preconditioner would close the gap in 1-2 iterations.
Finally, do not apply CG to nonsymmetric or indefinite systems. CG assumes A is symmetric positive definite. Apply it naively to a nonsymmetric matrix (for example a system with a convection term) and $p_k^\top A p_k$ can be negative or zero, causing divergence. For nonsymmetric systems use Bi-CGSTAB, GMRES or IDR(s); for symmetric indefinite ones use MINRES or SYMMLQ. Treating CG strictly as the SPD-optimal solver is the first step toward a safe implementation.