Discovering reliable approximate functional dependencies

Panagiotis Mandros, Mario Boley, Jilles Vreeken

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Given a database and a target attribute of interest, how can we tell whether there exists a functional, or approximately functional dependence of the target on any set of other attributes in the data? How can we reliably, without bias to sample size or dimensionality, measure the strength of such a dependence? And, how can we effi-ciently discover the optimal or «-approximate top-k dependencies» These are exactly the questions we answer in this paper. As we want to be agnostic on the form of the dependence, we adopt an information-theoretic approach, and construct a reliable, bias correcting score that can be efficiently computed. Moreover, we give an effective optimistic estimator of this score, by which for the first time we can mine the approximate functional dependencies from data with guarantees of optimality. Empirical evaluation shows that the derived score achieves a good bias for variance trade-off, can be used within an efficient discovery algorithm, and indeed discovers meaningful dependencies. Most important, it remains reliable in the face of data sparsity.

Original languageEnglish
Title of host publicationKDD 2017 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages355-363
Number of pages9
ISBN (Electronic)9781450348874
DOIs
StatePublished - 13 Aug 2017
Externally publishedYes
Event23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017 - Halifax, Canada
Duration: 13 Aug 201717 Aug 2017

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
VolumePart F129685

Conference

Conference23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017
Country/TerritoryCanada
CityHalifax
Period13/08/1717/08/17

Bibliographical note

Publisher Copyright:
© 2017 Copyright held by the owner/author(s).

Keywords

  • Information theory
  • Pattern discovery

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'Discovering reliable approximate functional dependencies'. Together they form a unique fingerprint.

Cite this