TY - GEN
T1 - Completeness and ambiguity of schema cover
AU - Gal, Avigdor
AU - Katz, Michael
AU - Sagi, Tomer
AU - Weidlich, Matthias
AU - Aberer, Karl
AU - Quoc Viet Nguyen, Hung
AU - Miklós, Zoltán
AU - Levy, Eliezer
AU - Shafran, Victor
PY - 2013
Y1 - 2013
N2 - Given a schema and a set of concepts, representative of entities in the domain of discourse, schema cover defines correspondences between concepts and parts of the schema. Schema cover aims at interpreting the schema in terms of concepts and thus, vastly simplifying the task of schema integration. In this work we investigate two properties of schema cover, namely completeness and ambiguity. The former measures the part of a schema that can be covered by a set of concepts and the latter examines the amount of overlap between concepts in a cover. To study the tradeoffs between completeness and ambiguity we define a cover model to which previous frameworks are special cases. We analyze the theoretical complexity of variations of the cover problem, some aim at maximizing completeness while others aim at minimizing ambiguity. We show that variants of the schema cover problem are hard problems in general and formulate an exhaustive search solution using integer linear programming. We then provide a thorough empirical analysis, using both real-world and simulated data sets, showing empirically that the integer linear programming solution scales well for large schemata. We also show that some instantiations of the general schema cover problem are more effective than others.
AB - Given a schema and a set of concepts, representative of entities in the domain of discourse, schema cover defines correspondences between concepts and parts of the schema. Schema cover aims at interpreting the schema in terms of concepts and thus, vastly simplifying the task of schema integration. In this work we investigate two properties of schema cover, namely completeness and ambiguity. The former measures the part of a schema that can be covered by a set of concepts and the latter examines the amount of overlap between concepts in a cover. To study the tradeoffs between completeness and ambiguity we define a cover model to which previous frameworks are special cases. We analyze the theoretical complexity of variations of the cover problem, some aim at maximizing completeness while others aim at minimizing ambiguity. We show that variants of the schema cover problem are hard problems in general and formulate an exhaustive search solution using integer linear programming. We then provide a thorough empirical analysis, using both real-world and simulated data sets, showing empirically that the integer linear programming solution scales well for large schemata. We also show that some instantiations of the general schema cover problem are more effective than others.
UR - http://www.scopus.com/inward/record.url?scp=84886742844&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-41030-7_15
DO - 10.1007/978-3-642-41030-7_15
M3 - Conference contribution
AN - SCOPUS:84886742844
SN - 9783642410291
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 241
EP - 258
BT - On the Move to Meaningful Internet Systems
T2 - Confederated International Conferences on On the Move to Meaningful Internet Systems, OTM 2013: CoopIS 2013, DOA-Trusted Cloud 2013, and ODBASE 2013
Y2 - 9 September 2013 through 13 September 2013
ER -