Data is generated by a fixed-seed LCG plus Box-Muller transform — the same parameters always produce the same point cloud.
Grey dots = data points / White × = mean / Red arrow = PC1 axis (length √λ₁) / Blue arrow = PC2 axis / Green ellipse = 1σ covariance ellipse
Given 2D data X = {(x_i, y_i)}, first center it by subtracting the mean and form the sample covariance matrix.
2×2 covariance matrix; σ_xx, σ_yy are variances, σ_xy is the covariance:
$$C = \begin{bmatrix} \sigma_{xx} & \sigma_{xy} \\ \sigma_{xy} & \sigma_{yy} \end{bmatrix}, \quad \sigma_{xy} = \frac{1}{n-1}\sum_i (x_i-\bar{x})(y_i-\bar{y})$$Closed-form 2×2 eigenvalues; T = trace, D = determinant:
$$\lambda_{1,2} = \frac{T \pm \sqrt{T^2 - 4D}}{2}, \quad T = \sigma_{xx}+\sigma_{yy}, \quad D = \sigma_{xx}\sigma_{yy} - \sigma_{xy}^2$$Explained variance ratio of the i-th principal component:
$$r_i = \frac{\lambda_i}{\lambda_1 + \lambda_2}$$The eigenvector $v_i$ satisfies $(C - \lambda_i I) v = 0$ and points along the dominant variation direction. PC1 is the direction of maximum variance.