Kaplan-Meier Survival Analysis Simulator

Parameters

Sample size

subj.

Total number of subjects (patients, parts) followed

True median survival

True median survival of the population the data come from

Censoring rate

Fraction of subjects whose follow-up ends before the event

Evaluation time

Time at which the survival S(t) is read off

Random seed

The same seed always generates the same dataset (reproducibility)

Results

—

Event count

—

Censored count

—

Survival at eval. S(t) (%)

—

KM median survival (mo)

—

At-risk set at eval. (subj.)

—

Censoring rate, actual (%)

—

Kaplan-Meier survival curve (animated)

The vertical axis is the survival probability (1 down to 0) and the horizontal axis is time. The curve steps down only at observed event times; small vertical ticks mark the censored observation times. The evaluation-time marker shows S(t).

KM survival curve S(t)

At-risk set trajectory n(t)

Theory & Key Formulas

$$\hat S(t)=\prod_{t_i\le t}\left(1-\frac{d_i}{n_i}\right)$$

The Kaplan-Meier estimator. At each time $t_i$ where an event occurs, divide the number of events $d_i$ at that time by the number $n_i$ still at risk just before it, and multiply the factors $(1-d_i/n_i)$ together. The curve steps down only at event times; a censored observation merely reduces the at-risk set $n_i$ and creates no step.

$$\lambda=\frac{\ln 2}{m},\qquad T=-\frac{\ln(1-u)}{\lambda}$$

The exponential model used to generate the data. The true median survival $m$ gives the event rate $\lambda$, and a uniform random number $u\in(0,1)$ is converted to an event time $T$ by inverse-transform sampling.

What is the Kaplan-Meier Method?

🙋

Survival analysis — is that the stair-step graph that slowly declines, the one you always see in medical papers? How is it different from a plain line chart?

🎓

Yes, that staircase is exactly the Kaplan-Meier curve. Survival analysis is about the time until some event occurs: a patient's relapse, a machine's failure, a subscriber's cancellation — anything. What sets it apart from a simple average is an awkward problem called "censoring", and handling that cleanly is the whole point of the Kaplan-Meier method.

🙋

"Censoring"? I have not heard that word. What is it?

🎓

Roughly, it means "a subject we could not follow to the end". Say a three-year study, and one patient moves away in year two and we lose contact. We do not know whether they relapsed. But we do know they "did not relapse for at least two years". Throwing that away would be wasteful, and we cannot pretend the event happened either. Raise the "censoring rate" slider on the left and you get more of those subjects.

🙋

I see — so how does the Kaplan-Meier method actually use that half-finished information?

🎓

It is clever. Instead of estimating the curve in one go, it looks only at the instants when an event actually happens. At each such time it computes the chance of surviving that instant — the number who got past it divided by the number at risk just before — and multiplies those together. That is what $\hat S(t)=\prod(1-d_i/n_i)$ means. A censored subject contributes to the at-risk denominator only while it was observed, and then quietly leaves. It creates no step down.

🙋

So the curve drops only where an event happens. What are the little vertical ticks at the censoring points?

🎓

Those are the censoring marks. The curve does not drop, but a tick shows that one subject left the at-risk set there. Later on, when the at-risk set has shrunk, a single event makes the curve plunge a long way. Look at the "at-risk set trajectory" chart below: the tail of the curve has very few subjects, and you can see how unstable the estimate becomes.

🙋

Sometimes the median shows "not reached". Is that an error?

🎓

Not an error — it is the honest answer. The median survival is "the time the survival first drops below 50%". But with short follow-up or heavy censoring, the curve may never fall all the way to 0.5 within the observation window. Saying "we cannot pin down the median" is exactly what "not reached" means. In real clinical trials it shows up often for treatment arms with a good prognosis.

Frequently Asked Questions

The Kaplan-Meier method estimates the time until some event occurs - a patient's relapse, a machine's failure, a customer's cancellation - from data that includes censored observations. Instead of estimating the whole survival curve at once, it computes, at each time an event actually occurs, the conditional probability of getting past that instant, and multiplies these probabilities together. The result is the characteristic descending staircase curve that steps down only at observed event times. It was published by Edward Kaplan and Paul Meier in 1958.

Censoring describes a subject who was followed for a while without the event happening and then dropped out of view: the study ended, the patient moved away, the warranty period elapsed. A censored subject carries real information - 'survived at least this long' - so you cannot simply ignore it, but you also cannot pretend the event happened. The Kaplan-Meier estimator handles this correctly by keeping each subject in the at-risk set for exactly as long as it was genuinely observed, and not a moment longer.

The median survival is defined as the smallest event time at which the KM survival S(t) first drops to 0.5 or below. When the follow-up is short, censoring is heavy, or the true survival is long, S(t) may never fall to 0.5 during the observation window. In that case the median cannot be computed and is reported as 'not reached'. This is not a bug - it is an honest statement that the data do not pin down the median, and a sign that longer follow-up or more subjects are needed.

The at-risk set is the group of subjects who, at a given time, have not yet had the event or been censored and could still experience the event. In the Kaplan-Meier estimate, the number n still at risk just before each event time is the denominator of the conditional survival probability 1-d/n. As time advances, events and censoring remove subjects, so the at-risk set declines step by step. Late in the curve, when the at-risk set becomes small, a single event moves the estimate sharply and uncertainty grows.

Real-World Applications

Medicine and clinical trials: This is where Kaplan-Meier analysis is used most. In a trial of a new cancer drug, the survival curves of the treatment and control arms are drawn side by side to compare time to relapse and overall survival. The familiar paper figures — "median XX months", "XX% survival at X years" — are read straight off these curves. An analysis that does not handle censoring correctly will not pass regulatory review.

Reliability engineering and life testing: In life tests of mechanical parts and electronics, some samples never fail within the allotted test time. Those are censored too. The Kaplan-Meier curve draws the fraction "still surviving without failure" as a function of time, and is used to set warranty periods and predict replacement intervals. It is a standard method for the life evaluation of bearings, batteries and semiconductors.

Customer churn analysis: For a subscription business, "when does a subscriber cancel" is the heart of the matter. Customers still active at the observation point form censored data — "kept their subscription at least this long". Survival analysis estimates the time to cancellation, and the effect of retention initiatives is judged by comparing curves. Reading the event as "cancellation", the same formulas as in medicine apply directly.

Other time-to-event data: Spells of unemployment (time to re-employment), loan default, and time to drop-out as a machine-learning prediction target — the method applies to any data where "time until something happens" is mixed with "censored, not yet happened". The common thread is to use up the information of each individual whose observation was cut short, but only over the range it was genuinely observed.

Common Misconceptions and Pitfalls

The biggest misconception is throwing censored subjects out of the data. Excluding a subject just because the event was not observed leaves only those who completed follow-up, and produces a serious bias that systematically over- or under-estimates survival. The whole value of the Kaplan-Meier method is that it uses up the information of censored subjects without discarding it, by keeping them in the at-risk set for exactly the period they were observed. A mean survival time computed after deleting censored data is meaningless.

Next, trusting the tail of the curve too much. The late part of a survival curve has only a handful of subjects still at risk. A single event there can make the curve plunge 10% or 20%, and that step is statistically almost meaningless. The convention in papers is to print the number at risk under the curve at each time point, and talking about "high or low 5-year survival" without looking at it is dangerous. The "at-risk set trajectory" chart in this tool makes exactly this caution visible.

Finally, forgetting the assumption that censoring is unrelated to the event (non-informative censoring). The Kaplan-Meier method assumes that the reason for censoring is independent of prognosis. If "patients who got worse are more likely to drop out", that assumption breaks and survival is biased optimistically. The data generation in this simulator satisfies non-informative censoring, but with real data you must always scrutinise why each subject was censored.

What is the Kaplan-Meier Method?

Frequently Asked Questions

Real-World Applications

Common Misconceptions and Pitfalls

How to Use

Worked Example

Practical Notes