Evaluating Quality of Disparate Data Sources: A Discord-Driven Approach

Yeasmin Ara Akter, Alberto Abelló, Petar Jovanovic, Tomer Sagi, Katja Hose

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Among other measures of data quality, determining the reliability of conflicting values from different sources is especially challenging. Traditional data fusion approaches often infer correct values in simple cases, but struggle to handle variations in data granularity (such as differences in temporal, spatial, or categorical aggregations) and offer limited insight into the nature of disagreements. Thus, we propose a new source evaluation approach for numerical attributes that measures discordance (i.e., the extent to which sources differ from each other). Unlike existing methods that focus solely on point estimation, we allow both fine-grained and coarse-grained analysis, allowing more sophisticated data quality assessments. We employ a linear programming solver that transparently adapts to any data alignment expressed in a set of operators resembling relational algebra. Extensive experiments on real-world datasets demonstrate that our method generalizes existing truth discovery techniques measuring differences with Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and can adapt to diverse and complex scenarios.

Original languageEnglish
Title of host publicationAdvances in Databases and Information Systems - 29th European Conference, ADBIS 2025, Proceedings
EditorsPanos K. Chrysanthis, Kostas Stefanidis, Zheying Zhang, Kjetil Nørvåg
PublisherSpringer Science and Business Media Deutschland GmbH
Pages147-163
Number of pages17
ISBN (Print)9783032052803
DOIs
StatePublished - 2026
Externally publishedYes
Event29th European Conference on Advances in Databases and Information Systems, ADBIS 2025 - Tampere, Finland
Duration: 23 Sep 202526 Sep 2025

Publication series

NameLecture Notes in Computer Science
Volume16043 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference29th European Conference on Advances in Databases and Information Systems, ADBIS 2025
Country/TerritoryFinland
CityTampere
Period23/09/2526/09/25

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.

Keywords

  • Data Fusion
  • Discordance
  • Linear Programming
  • Truth Discovery

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Evaluating Quality of Disparate Data Sources: A Discord-Driven Approach'. Together they form a unique fingerprint.

Cite this