Branching bandit processes

A set of niarms of type i, i = 1,…, L, is available. A pull of arm of type i occupies a duration Viat the end of which a reward Ci and Ni1,…, NiLnew arms are obtained, while all other arms are frozen. A Gittins priority order of types is obtained and shown to yield the maximal discounted reward from this branching process of arms.

Original languageEnglish
Pages (from-to)269-278
Number of pages10
JournalProbability in the Engineering and Informational Sciences
Issue number3
StatePublished - Jul 1988
Externally publishedYes

