How is the stationary distribution π computed?

For the transition matrix P = [[1-p, p], [q, 1-q]], the stationary distribution π = (π_1, π_2) satisfies π P = π and π_1 + π_2 = 1. Solving these gives the closed form π_1 = q/(p+q) and π_2 = p/(p+q). Intuitively, this is the balance point where outflow from state 1 (p·π_1) equals inflow into state 1 from state 2 (q·π_2).

What does the mixing time t_mix mean?

The mixing time is the number of steps required for the influence of the initial distribution to decay to a small fraction of its starting value. This tool reports t_mix = ln(0.01)/ln|1−p−q|, i.e. the step count at which |P_1(t)−π_1|/|P_1(0)−π_1| reaches 0.01. The factor |1−p−q| is the second eigenvalue λ_2 of P, so as p+q approaches 0 or 2 the chain mixes very slowly (λ_2 → ±1).

How is the exponential convergence of P_1(t) derived?

The transition matrix P has eigenvalues λ_1 = 1 (the stationary mode) and λ_2 = 1 − p − q. Any initial distribution decomposes onto the corresponding eigenvectors, so after t steps P_1(t) = π_1 + (P_1(0) − π_1)·(1−p−q)^t. The second term decays as λ_2^t, giving exponential convergence to π_1 — a textbook example of the Perron–Frobenius theorem in action.

What are some real applications?

Reliability engineering uses 'up ⇄ down' Markov models to compute availability, queueing theory uses 'idle ⇄ busy' chains, NLP uses part-of-speech transitions, and reinforcement learning is built on Markov Decision Processes. Google PageRank is itself a giant Markov-chain stationary-distribution problem, and this two-state tool is the smallest non-trivial instance of that whole framework.

Markov Chain Simulator — Two-State Stationary Distribution

Parameters

P 1 to 2

P 2 to 1

Initial P state 1

Observation step t

p = P(1→2), q = P(2→1). Hit Play to sweep t from 0 to 50 and watch the chain converge from any initial distribution to π.

Results

—

Stationary π_1

—

Stationary π_2

—

P_1(t)

—

t_mix (99%)

State transition diagram

Two circles are states 1 and 2. The radius of each circle is proportional to its current probability at step t. Arrows are transitions p (1→2) and q (2→1); self-loops are stay-probabilities (1−p) and (1−q).

Exponential convergence of P_1(t)

x-axis is the step t (0–50), y-axis is the probability P_1(t). The dashed horizontal line is the stationary value π_1 and the yellow marker is the current t. The decay rate is the absolute eigenvalue |1−p−q|.

Theory & Key Formulas

For a two-state Markov chain with transition matrix $P = \begin{pmatrix} 1-p & p \\ q & 1-q \end{pmatrix}$, the eigenvalues are $\lambda_1 = 1$ and $\lambda_2 = 1-p-q$, and the stationary distribution is

$$\pi_1 = \frac{q}{p+q}, \qquad \pi_2 = \frac{p}{p+q}$$

The probability of being in state 1 after $t$ steps follows from the eigendecomposition:

$$P_1(t) = \pi_1 + (P_1(0) - \pi_1)\,(1-p-q)^t$$

The 99% mixing time comes from $|P_1(t)-\pi_1| / |P_1(0)-\pi_1| = 0.01$, giving $t_{\text{mix}} = \ln(0.01) / \ln|1-p-q|$. As $p+q \to 0$ or $\to 2$, $|\lambda_2| \to 1$ and the mixing time diverges — the chain becomes nearly periodic or nearly absorbing.

What is the Markov Chain Simulator

🙋

A Markov chain is just 'the future depends only on the present', right? What is interesting about doing it with only two states?

🎓

Good question. Two states is the smallest non-trivial chain, and crucially it is the only size where every important quantity — stationary distribution, eigenvalues, mixing time — has a clean closed form. With p=0.30, q=0.40 you can verify by hand that $\pi_1 = q/(p+q) = 0.5714$. The point of this tool is to turn those formulas into a moving picture so you can feel them.

🙋

When I push p+q close to 1, the convergence curve flattens almost instantly. Why?

🎓

The eigenvalue $\lambda_2 = 1-p-q$ controls the decay. When $\lambda_2 \approx 0$, $\lambda_2^t$ vanishes after one step. With p=q=0.01 instead, $\lambda_2 \approx 0.98$ and $0.98^{50} \approx 0.36$, so almost all of the initial distribution remains. The formula $t_{\text{mix}} = \ln(0.01)/\ln|\lambda_2|$ explains why: when $|\lambda_2| \to 1$ the denominator goes to zero and the mixing time blows up.

🙋

If I push p and q both to 0.99, $\lambda_2$ becomes negative. What happens then?

🎓

A negative eigenvalue gives oscillatory convergence. $P_1(t)$ flips above and below $\pi_1$ at every step while slowly damping. Pressing Play with these settings shows a zig-zag curve. This is a chain close to being periodic-2 (bipartite), which would never converge if it were exactly periodic. In practice you see this in alternating-state systems like delayed renewal or signalling that flips at every clock tick.

🙋

PageRank is also a Markov chain, right? How is that different from this?

🎓

Same idea, very different scale. PageRank is a Markov chain over billions of web pages; each page is a state, each link is a transition probability, and the stationary distribution $\pi$ ranks importance. It is exactly the 2×2 calculation here scaled to $10^9 \times 10^9$. The implementation is just power iteration — $\pi^{(t+1)} = \pi^{(t)} P$ — which is the high-dimensional version of $P_1(t) = \pi_1 + (P_1(0)-\pi_1)\lambda^t$.

FAQ

A two-state Markov chain is irreducible iff p>0 and q>0; this tool enforces that with the slider range 0.01–0.99. As long as one of the self-loops 1−p, 1−q is positive, the chain is also aperiodic, and the Perron–Frobenius theorem then guarantees a simple eigenvalue 1 whose left eigenvector is the unique stationary distribution π. In the limit p+q=2 the chain becomes period-2 and does not converge — the slider stops short of that pathological case.

The relaxation time is $\tau_{\text{rel}} = 1/(1-|\lambda_2|)$, the characteristic time after which the discrepancy from π decays to $1/e \approx 0.368$. The mixing time is the number of steps to decay below a chosen threshold (1% in this tool); the two are related by $t_{\text{mix}} \approx \tau_{\text{rel}} \ln(1/\epsilon)$. Our formula $t_{\text{mix}} = \ln(0.01)/\ln|\lambda_2|$ is the exact discrete count for the 1% threshold; different references using slightly different thresholds or the L^1 / total-variation distance will report nearby but not identical values.

Markov reliability analysis builds an 'up ⇄ down' chain from the failure rate λ and repair rate μ of a redundant system, and reads off availability A = μ/(λ+μ) — exactly the formula π_1 = q/(p+q) of this tool with p ↔ λ and q ↔ μ. Where FMEA and FTA combine probabilities through static tree structures, Markov models add the time-evolution from a transient state to steady state. The same backbone shows up in discrete-event simulation and in reliability-block-diagram solvers used in CAE.

For an N-state chain the transition matrix is N×N with N eigenvalues (one of them always λ_1=1). The stationary distribution solves the linear system $\pi P = \pi$, but for N≥3 there is no general closed form. Power iteration is the standard implementation — and Google PageRank is exactly that. The mixing time is governed by the second-largest eigenvalue magnitude $|\lambda_2|$ (the 'spectral gap'), with $t_{\text{mix}} \approx \ln(1/\epsilon)/(1-|\lambda_2|)$. The 2×2 case here is the smallest one in which every concept can be checked by hand.

Real-world applications

Reliability engineering and maintenance: A 'production line up ⇄ down' Markov model with mean-time-to-failure MTTF = 1/p and mean-time-to-repair MTTR = 1/q gives a steady-state availability A = q/(p+q). With p=0.05, q=0.50 this tool produces A=0.909, i.e. 90.9% availability — a number plant maintenance teams use as a daily KPI. Extending to three states (operating, degraded, failed) supports preventive-maintenance optimisation.

Web search and PageRank: Google PageRank is the stationary distribution of a giant Markov chain over hundreds of billions of pages, computed by damped power iteration with a damping factor of 0.85. Link structure is the transition matrix and the stationary probability is the importance rank. Recommendation systems (YouTube/Amazon related items, Spotify radio) use the same machinery, often as a random walk on the user–item bipartite graph.

Reinforcement learning and MDP: A Markov Decision Process adds an action a, with the Bellman equation V(s) = max_a [r(s,a) + γ Σ P(s'|s,a) V(s')] giving the optimal policy. Q-learning, SARSA, and Deep Q-Networks all assume the Markov property, and applications such as AlphaGo, autonomous driving, and the RLHF stage of ChatGPT rest on the same MDP foundation. This tool covers the action-free, reward-free base case so the eigenvalue intuition can be built first.

NLP and Hidden Markov Models (HMM): Speech recognition, part-of-speech tagging, and bio-sequence analysis use HMMs — a hidden Markov chain plus state-dependent observation probabilities. They are trained with Baum-Welch and decoded with Viterbi, but the underlying transition matrix is analyzed exactly as here. T9 mobile-keyboard prediction and DNA methylation analysis are everyday examples of the same framework.

Common misconceptions and caveats

The most common error is to assume that once the chain has 'reached' π the state stops changing. In fact every transition still fires randomly with probabilities p and q; the right interpretation is that the long-run fraction of time spent in state 1 equals π_1. With π_1=0.5714 you should expect roughly 57 steps in state 1 out of every 100. The probability curve P_1(t) drawn here is the ensemble expectation over infinitely many sample paths, not a single trajectory.

A frequent confusion is to read 'memorylessness' as 'no information about the past'. The Markov property P(X_{t+1}|X_t,X_{t-1},...) = P(X_{t+1}|X_t) only says that all useful past information must be encoded in the current state X_t. A naive stock-price random walk is Markov, but if you redefine the state as (price, yesterday's return) the new chain is Markov with strictly more memory. State-space design is the lever that makes models predictive.

Finally, do not treat the mixing time as a hard number. Our $t_{\text{mix}} = \ln(0.01)/\ln|\lambda_2|$ depends on the choice of ε=0.01. Switching to ε=0.05 shrinks it by ~35%; choosing ε=0.001 stretches it by ~50%. In MCMC practice the analogous quantity is the burn-in length, and rigorous convergence guarantees use the total-variation distance instead of an L^∞ proxy. Always report the threshold alongside the number.

Markov Chain Simulator — Two-State Stationary Distribution and Mixing Time

What is the Markov Chain Simulator

FAQ

Real-world applications

Common misconceptions and caveats

How to Use

Worked Example

Practical Notes

Markov Chain Simulator — Two-State Stationary Distribution and Mixing Time

What is the Markov Chain Simulator

FAQ

Real-world applications

Common misconceptions and caveats

Related Tools

How to Use

Worked Example

Practical Notes