Mean shift clustering algorithm for data with missing values

Loai AbdAllah, Ilan Shimshoni

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Missing values in data are common in real world applications. There are several methods that deal with this problem. In this research we developed a new version of the mean shift clustering algorithm that deals with datasets with missing values. We use a weighted distance function that deals with datasets with missing values, that was defined in our previous work. To compute the distance between two points that may have attributes with missing values, only the mean and the variance of the distribution of the attribute are required. Thus, after they have been computed, the distance can be computed in O(1). Furthermore, we use this distance to derive a formula for computing the mean shift vector for each data point, showing that the mean shift runtime complexity is the same as the Euclidian mean shift runtime. We experimented on six standard numerical datasets from different fields. On these datasets we simulated missing values and compared the performance of the mean shift clustering algorithm using our distance and the suggested mean shift vector to other three basic methods. Our experiments show that mean shift using our distance function outperforms mean shift using other methods for dealing with missing values.

Original languageEnglish
Title of host publicationData Warehousing and Knowledge Discovery - 16th International Conference, DaWaK 2014, Proceedings
PublisherSpringer Verlag
Pages426-438
Number of pages13
ISBN (Print)9783319101590
DOIs
StatePublished - 2014
Event16th International Conference on Data Warehousing and Knowledge Discovery, DaWaK 2014 - Munich, Germany
Duration: 2 Sep 20144 Sep 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8646 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference16th International Conference on Data Warehousing and Knowledge Discovery, DaWaK 2014
Country/TerritoryGermany
CityMunich
Period2/09/144/09/14

Keywords

  • Clustering
  • Distance metric
  • Mean Shift
  • Missing values
  • Weighted Euclidian distance

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science (all)

Fingerprint

Dive into the research topics of 'Mean shift clustering algorithm for data with missing values'. Together they form a unique fingerprint.

Cite this