Non-binary evaluation measures for big data integration

Tomer Sagi, Avigdor Gal

Research output: Contribution to journalArticlepeer-review

Abstract

The evolution of data accumulation, management, analytics, and visualization has led to the coining of the term big data, which challenges the task of data integration. This task, common to any matching problem in computer science involves generating alignments between structured data in an automated fashion. Historically, set-based measures, based upon binary similarity matrices (match/non-match), have dominated evaluation practices of matching tasks. However, in the presence of big data, such measures no longer suffice. In this work, we propose evaluation methods for non-binary matrices as well. Non-binary evaluation is formally defined together with several new, non-binary measures using a vector space representation of matching outcome. We provide empirical analyses of the usefulness of non-binary evaluation and show its superiority over its binary counterparts in several problem domains.

Original languageEnglish
Pages (from-to)105-126
Number of pages22
JournalVLDB Journal
Volume27
Issue number1
DOIs
StatePublished - 1 Feb 2018

Bibliographical note

Funding Information:
Acknowledgements The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7/2007-2013) under the NisB (http://nisb-project.eu/) project, Grant Agreement No. 256955.

Publisher Copyright:
© 2017, Springer-Verlag GmbH Germany.

Keywords

  • Data integration
  • Evaluation
  • Matching

ASJC Scopus subject areas

  • Information Systems
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Non-binary evaluation measures for big data integration'. Together they form a unique fingerprint.

Cite this