Selective sampling for trees and forests

Murad Badarna, Ilan Shimshoni

Research output: Contribution to journalArticlepeer-review

Abstract

In this paper we describe selective sampling algorithms for decision trees and random forests and their contribution to the classification accuracy. In our selective sampling algorithms, the instance that yields the highest expected utility is chosen to be labeled by the expert. We show that it is possible to obtain the most valuable unlabeled instance to be labeled by the expert and added to the training dataset of the decision tree simply by depicting the influence of this new instance on the class probabilities of the leaves. All the unlabeled instances that fall into the same leaf will have the same class probabilities. As a result, we can compute the expected accuracy of the decision tree according to its leaves instead for each individual unlabeled instance. An extension for random forests is also presented. Moreover, we show that the selective sampling classifier has to belong to the same family as the classifier whose accuracy we wish to improve but need not be identical to it. For example, a random forest classifier can be used for the selective sampling process, and the results can be used to improve the classification accuracy of a decision tree. Likewise, a random forest classifier consisting of three trees can be used in the selective sampling algorithm to improve the classification accuracy of a random forest consisting of ten trees. Our experiments show that the proposed selective sampling algorithms achieve better accuracy than the standard random sampling, uncertainty sampling and the active belief decision tree learning approach (ABC4.5) for several real-world datasets. We also show that our selective sampling algorithms improve significantly the classification performance of several state-of-the-art classifiers such as the random rotation forest classifier for real-world large-scale datasets.

Original languageEnglish
Pages (from-to)93-108
Number of pages16
JournalNeurocomputing
Volume358
DOIs
StatePublished - 17 Sep 2019

Bibliographical note

Publisher Copyright:
© 2019

Keywords

  • Active learning
  • Classification
  • Decision trees
  • Random forests
  • Selective sampling

ASJC Scopus subject areas

  • Computer Science Applications
  • Cognitive Neuroscience
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Selective sampling for trees and forests'. Together they form a unique fingerprint.

Cite this