Trouble With the Curve: Beyond the Basics With Kaplan-Meier

Oncology Fellows, Vol. 14/No. 1, Volume 14, Issue 1

Rahul Banerjee, MD, discusses how to navigate the obstacles in interpreting Kaplan-Meier curves.

Compared with other eponymous curves, like Frank-Starling, Kaplan-Meier (K-M) curves are rarely mentioned in medical school. Nevertheless, you’ve no doubt come across them in articles, presentations, drug inserts, and more. As initially described by Edward L. Kaplan, PhD; and Paul Meier, PhD, 63 years ago, these curves summarize time-to-event probabilities in datasets of patients who receive different amounts of follow-up.1

Probably you understand the gist of K-M curves (if not, see references 2 and 3). What follows is a primer that trainees like you and me can use to understand the anatomy of K-M curves. This article does not delve into the assumptions or mathematical intricacies that underlie the curves but focuses instead on essential concepts.

Examining the Vertical Axis

The y-axis shows the probability—0% to 100% or 0.0 to 1.0—of event-free survival with regard to the event of interest. However, the event of interest relative to survival varies (see Table). Overall survival (OS), or length of time to patient death from any cause, is always relevant. When considering progression-free survival (PFS), disease-free survival (DFS), and relapse-free survival (RFS), ask yourself whether the event is meaningful to the patient. It probably is in an aggressive malignancy but not for in indolent chronic condition.

Certain modifications to these measurements warrant discussion. Time to next treatment (TTNT) generally exceeds PFS because not every patient with progressive disease requires or desires a new therapy immediately.4 Conversely, when patients receive additional therapy after suboptimal responses to chemotherapy but don't strictly meet the criteria for progressive disease (eg. as with Hodgkin lymphoma), modified PFS, which combines elements of TTNT and true PFS, has been used.5 In certain cases, unique complications may be added to RFS as events of interest, such as graft-vs-host disease-free/relapse-free survival (GRFS) in studies of allogenic stem cell transplantation.6

Finally, with PFS2(time to second objective disease progression), the emphasis is on PFS after next line of therapy (beyond line being studied) is initiated.

For instance, the KEYNOTE-024 study of pembrolizumab plus chemotherapy versus chemotherapy alone was more accurately a study of “pembrolizumab now” (ie, combination therapy in first line vs “pembrolizumab later” (ie, saving check-point inhibition for relapse). In this case, PFS2 shows whether initial treatment with pembrolizumab offers an advantage over waiting until time of relapse.7

Examining the Horizontal Axis

There are 3 questions you should ask while reviewing the x-axis:

  1. Bottom left: Was time zero defined as time of diagnosis, of study enrollment, or of something else? With landmark analysis, time zero is artificially set to a time after treatment begins to prevent immortal time bias.8,9
  2. Bottom right: Is the rightmost, longest, amount of follow-up clinically relevant? Studies of indolent cancers or precancers, given their natural history, should at a minimum include several years’ worth of follow-up. Always look elsewhere in the manuscript for the median amount of follow-up for each arm listed because this information cannot be readily calculated from the curve alone.
  3. Bottom center: The number of patients at risk at each timepoint should be listed under the x-axis. Use these numbers to get a sense of how reliable the results are at each time-point. If 100 patients entered a study, but the entire right half of the K-M curve is based on only 5 patients who underwent sufficiently long follow-up, the results should be viewed with some skepticism.

Examining the Curve (or Curves)

You are probably familiar with median survival, which is either reported with a confidence interval or summarized as "not reached" if less than 50% of patients had events of interest during the follow-up period. Likewise, you’ve learned to look at the right side of the K-M curve for a plateau showing that beyond a certain point events of interest no longer occur and thus suggesting a cure. But keep in mind 2 other questions.

First, how are patients censored? All longitudinal data-sets have a time cutoff, either a data lock date or a patient’s last clinic follow-up before the present day. Patients who don’t have an event of interest before the cutoff date are censored. But what if they are censored for other reasons, like study discontinuation or premature loss to followup due to toxicity? As others have noted, such censoring can affect the results of studies in oncology.10,11 When you see icons on the K-M curve indicating censoring, make sure you check how the term is defined.

Second, if there are multiple curves on the same plot, when do they intersect and when do they diverge or converge? You often see curves intersecting when a high-risk, high-reward strategy is being compared against a lowrisk, low-reward one, such as aggressive frontline chemotherapy versus milder regimens. If a pair of curves converge or diverge as you move toward the right, you can surmise that the dĭerence in benefit begins to emerge or dissipate with extended follow-up. All of which may be relevant to you and your patients during clinical decisionmaking. Of note, curves that cross make it more difficult to test a hypothesis using proportionalhazards regression.12,13

If you’re interested in learning more, consider taking a biostatistical course. Several professional societies also host workshops for oncology fellows that cover these concepts in more detail, including the American Society of Clinical Oncology/American Association for Clinical Research Vail Workshop and the American Society for Transplantation and Cellular Therapy Clinical Research Training Course.

A final word of warning: A K-M curve is only as good as the data that goes into it. Beware of selection bias and confounding that might affect how you interpret the curve. For example, in the curve above, it would appear that ventilators are associated with worse survival among hospitalized patients with COVID-19, but patient acuity is clearly the underlying confounder driving what you see. K-M curves convey a wealth of information beyond the hazard ratio, but only if you Nnow how to interpret them.

References

  1. Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J Am Stat Assoc.1958;53(282):457-481. doi:10.2307/2281868
  2. Dudley WN, Wickham R, Coombs N. An introduction to survival statistics: Kaplan-Meier analysis. J Adv Pract Oncol 2016;7(1):91-100. doi:10.6004/jadpro.2016.7.1.8
  3. Schober P, Vetter TR. Kaplan-Meier curves, log-rank tests, and Cox regression for time-to-event data. Anesth Analg. 2021;132(4):969-970. doi:10.1213/ANE.0000000000005358
  4. Walker B, Boyd M, Aguilar K, et al. Comparisons of real-world time-to-event end points in oncology research. JCO Clin Cancer Inform. 2021;5:45-46. doi:10.1200/CCI.20.00125
  5. Strauss DJ, Długosz-Danecka M, Alekseev S, et al. Brentuximab vedotin with chemotherapy for stage III/IV classical Hodgkin lymphoma: 3-year update of the ECHELON-1 study. Blood. 2020;135(10):735-742. doi:10.1182/blood.2019003127
  6. Holtan SG, DeFor TE, Lazaryan A, et al. Composite end point of graft-versus-host disease-free, relapse-free survival after allogeneic hematopoietic cell transplantation. Blood. 2015;125(8):1333-1338. doi:10.1182/blood-2014-10-609032
  7. Brahmer JR, Rodriguez-Abreu D, Robinson AG, et al. Progression after the next line of therapy (PFS2) and updated OS among patients (pts) with advanced NSCLC and PD-L1 tumor proportion score (TPS) ≥50% enrolled in KEYNOTE-024. J Clin Oncol. 2017; 35(suppl 5):9000. doi:10.1200/JCO.2017.35.15_suppl.9000
  8. Farr AM, Foley K. Landmark analysis to adjust for immortal time bias in oncology studies using claims data linked to death data. Value Health. 2013;16(3):A50. doi:10.1016/j.jval.2013.03.284
  9. Agarwal P, Moshier E, Ru M, et al. Immortal time bias in observational studies of time-to-event outcomes: assessing effects of postmastectomy radiation therapy using the National Cancer Database. Cancer Control 2018; 25(1):1-7. doi:10.1177/1073274818789355
  10. Fojo T, Simon RM. Inappropriate censoring in Kaplan-Meier analyses. Lancet Oncol. 2021; 22(10):1358-1360. doi:10.1016/S1470-2045(21)00473-3
  11. Rosen K, Prasad V, Chen EY. Censored patients in Kaplan-Meier plots of cancer drugs: an empirical analysis of data sharing. Eur J Cancer. 2020;141:152-161. doi:10.1016/j.ejca.2020.09.031
  12. Bouliotis G, Billingham L. Crossing survival curves: alternatives to the log-rank test. Trials. 2011;12(suppl 1):A137.doi:10.1186/1745-6215-12-S1-A137
  13. Li H, Han D, Hou Y, et al. Statistical inference methods for two crossing survival curves: a comparison of methods. PLoS One. 2015;10(1):e0116774. doi:10.1371/journal.pone.0116774