Q-learning with censored data

Yair Goldberg, Michael R. Kosorok

Research output: Contribution to journalArticlepeer-review


We develop methodology for a multistage decision problem with flexible number of stages in which the rewards are survival times that are subject to censoring. We present a novel Q-learning algorithm that is adjusted for censored data and allows a flexible number of stages. We provide finite sample bounds on the generalization error of the policy learned by the algorithm, and show that when the optimal Q-function belongs to the approximation space, the expected survival time for policies obtained by the algorithm converges to that of the optimal policy. We simulate a multistage clinical trial with flexible number of stages and apply the proposed censored-Q-learning algorithm to find individualized treatment regimens. The methodology presented in this paper has implications in the design of personalized medicine trials in cancer and in other life-threatening diseases.

Original languageEnglish
Pages (from-to)529-560
Number of pages32
JournalAnnals of Statistics
Issue number1
StatePublished - Feb 2012
Externally publishedYes


  • Generalization error
  • Q-learning
  • Reinforcement learning
  • Survival analysis

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty


Dive into the research topics of 'Q-learning with censored data'. Together they form a unique fingerprint.

Cite this