Abstract
We develop methodology for a multistage decision problem with flexible number of stages in which the rewards are survival times that are subject to censoring. We present a novel Q-learning algorithm that is adjusted for censored data and allows a flexible number of stages. We provide finite sample bounds on the generalization error of the policy learned by the algorithm, and show that when the optimal Q-function belongs to the approximation space, the expected survival time for policies obtained by the algorithm converges to that of the optimal policy. We simulate a multistage clinical trial with flexible number of stages and apply the proposed censored-Q-learning algorithm to find individualized treatment regimens. The methodology presented in this paper has implications in the design of personalized medicine trials in cancer and in other life-threatening diseases.
| Original language | English |
|---|---|
| Pages (from-to) | 529-560 |
| Number of pages | 32 |
| Journal | Annals of Statistics |
| Volume | 40 |
| Issue number | 1 |
| DOIs | |
| State | Published - Feb 2012 |
| Externally published | Yes |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- Generalization error
- Q-learning
- Reinforcement learning
- Survival analysis
ASJC Scopus subject areas
- Statistics and Probability
- Statistics, Probability and Uncertainty
Fingerprint
Dive into the research topics of 'Q-learning with censored data'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver