Abstract
In many real scenarios it is difficult to obtain a large set of labeled examples to train classifiers, since each example must be labeled by an expert which may be a costly process. It is therefore important to reduce the number of training examples. We therefore turned to lookahead selective sampling techniques to select points to be classified by the expert. Our algorithm is based on [17]. They use the Euclidean distance as the basis of a nearest neighbor classifier. This distance may not reflect the inter class distance. We therefore developed an algorithm which uses several clustering results, known as ensemble clustering (EC), to define a clustering-based distance metric to approximate the similarity between objects, under the presumption that points in the same cluster usually share some common traits even though their geometric distance might be large. Under this metric, points which always belong to the same cluster form equivalence classes. As a result the number of points which have to be considered is reduced dramatically, resulting in an algorithm which is at least two orders of magnitude faster then the Euclidean based algorithm. Our algorithm usually yields better results. It outperforms two basic standard active learning algorithms. The algorithm was tested on several standard datasets and for segmenting color images. I. INTRODUCTION Supervised learning algorithms require that a set of labeled examples are given to the algorithm in order to train a classifier. In many cases we want to construct a training dataset or add examples to the training dataset in order to improve the classifier's quality. In real environments, it is usually difficult to obtain a large set of labeled examples since each example must be labeled by a domain expert. It is therefore important to reduce the number of the training examples. To achieve this goal we should provide the learning algorithm with some control over the inputs on which it trains. This paradigm is called active learning. Selective sampling, is one of the common active learning approaches. It assumes that a set of unlabeled examples is available, and the learner selects an unlabeled example from the given set and asks the teacher to label it. Our algorithm was inspired by selective sampling for near-est neighbor classifier (LSS) [17], which presented a selective sampling methodology for nearest-neighbor (NN) classifica-tion learning algorithms. At each iteration of the algorithm, all of the unlabeled examples are tested and the point which yields the highest expected utility is chosen. Studying this algorithm we came across two problems.
Original language | English |
---|---|
State | Published - 25 Aug 2012 |
Event | The 4th Conference on Data Mining and Optimization (DMO2012) - Duration: 1 Aug 2012 → … |
Conference
Conference | The 4th Conference on Data Mining and Optimization (DMO2012) |
---|---|
Period | 1/08/12 → … |