TY - GEN
T1 - k nearest neighbor using ensemble clustering
AU - Abedallah, Loai
AU - Shimshoni, Ilan
PY - 2012
Y1 - 2012
N2 - The performance of the k Nearest Neighbor (kNN) algorithm depends critically on its being given a good metric over the input space. One of its main drawbacks is that kNN uses only the geometric distance to measure the similarity and the dissimilarity between the objects without using any statistical regularities in the data, which could help convey the inter-class distance. We found that objects belonging to the same cluster usually share some common traits even though their geometric distance might be large. We therefore decided to define a metric based on clustering. As there is no optimal clustering algorithm with optimal parameter values, several clustering runs are performed yielding an ensemble of clustering (EC) results. The distance between points is defined by how many times the objects were not clustered together. This distance is then used within the framework of the kNN algorithm (kNN-EC). Moreover, objects which were always clustered together in the same clusters are defined as members of an equivalence class. As a result the algorithm now runs on equivalence classes instead of single objects. In our experiments the number of equivalence classes is usually one tenth to one fourth of the number of objects. This equivalence class representation is in effect a smart data reduction technique which can have a wide range of applications. It is complementary to other data reduction methods such as feature selection and methods for dimensionality reduction such as for example PCA. We compared kNN-EC to the original kNN on standard datasets from different fields, and for segmenting a real color image to foreground and background. Our experiments show that kNN-EC performs better than or comparable to the original kNN over the standard datasets and is superior for the color image segmentation.
AB - The performance of the k Nearest Neighbor (kNN) algorithm depends critically on its being given a good metric over the input space. One of its main drawbacks is that kNN uses only the geometric distance to measure the similarity and the dissimilarity between the objects without using any statistical regularities in the data, which could help convey the inter-class distance. We found that objects belonging to the same cluster usually share some common traits even though their geometric distance might be large. We therefore decided to define a metric based on clustering. As there is no optimal clustering algorithm with optimal parameter values, several clustering runs are performed yielding an ensemble of clustering (EC) results. The distance between points is defined by how many times the objects were not clustered together. This distance is then used within the framework of the kNN algorithm (kNN-EC). Moreover, objects which were always clustered together in the same clusters are defined as members of an equivalence class. As a result the algorithm now runs on equivalence classes instead of single objects. In our experiments the number of equivalence classes is usually one tenth to one fourth of the number of objects. This equivalence class representation is in effect a smart data reduction technique which can have a wide range of applications. It is complementary to other data reduction methods such as feature selection and methods for dimensionality reduction such as for example PCA. We compared kNN-EC to the original kNN on standard datasets from different fields, and for segmenting a real color image to foreground and background. Our experiments show that kNN-EC performs better than or comparable to the original kNN over the standard datasets and is superior for the color image segmentation.
KW - Classification
KW - Clustering
KW - Ensemble Clustering
KW - Unsupervised Distance Metric Learning
UR - http://www.scopus.com/inward/record.url?scp=84866661795&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-32584-7_22
DO - 10.1007/978-3-642-32584-7_22
M3 - Conference contribution
AN - SCOPUS:84866661795
SN - 9783642325830
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 265
EP - 278
BT - Data Warehousing and Knowledge Discovery - 14th International Conference, DaWaK 2012, Proceedings
T2 - 14th International Conference on Data Warehousing and Knowledge Discovery, DaWaK 2012
Y2 - 3 September 2012 through 6 September 2012
ER -