Clustering Algorithms for Incomplete Datasets

Loai AbdAllah, Ilan Shimshoni

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review


Many real-world dataset suffers from the problem of missing values. Several methods were developed to deal with this problem. Many of them filled the missing values within fixed value based on statistical computation. In this research, we developed a new versions of the k-means and the mean shift clustering algorithms that deal with datasets with missing values without filling their values. We developed a new distance function that is able to compute distances over incomplete datasets. The distance was computed based only on the mean and variance of the data for each attribute. As a result, the runtime complexity of our computation was O1. We experimented on six standard numerical datasets from different fields. On these datasets, we simulated missing values and compared the performance of the developed algorithms using our distance and the suggested mean computations to other three basic methods. Our experiments show that the developed algorithms using our distance function outperform the existing k-means and mean shift using other methods for dealing with missing values.
Original languageEnglish
Title of host publicationRecent Applications in Data Clustering
StatePublished - 2018


Dive into the research topics of 'Clustering Algorithms for Incomplete Datasets'. Together they form a unique fingerprint.

Cite this