TY - GEN
T1 - Non-redundant subgroup discovery using a closure system
AU - Boley, Mario
AU - Grosskreutz, Henrik
PY - 2009
Y1 - 2009
N2 - Subgroup discovery is a local pattern discovery task, in which descriptions of subpopulations of a database are evaluated against some quality function. As standard quality functions are functions of the described subpopulation, we propose to search for equivalence classes of descriptions with respect to their extension in the database rather than individual descriptions. These equivalence classes have unique maximal representatives forming a closure system. We show that minimum cardinality representatives of each equivalence class can be found during the enumeration process of that closure system without additional cost, while finding a minimum representative of a single equivalence class is NP-hard. With several real-world datasets we demonstrate that search space and output are significantly reduced by considering equivalence classes instead of individual descriptions and that the minimum representatives constitute a family of subgroup descriptions that is of same or better expressive power than those generated by traditional methods.
AB - Subgroup discovery is a local pattern discovery task, in which descriptions of subpopulations of a database are evaluated against some quality function. As standard quality functions are functions of the described subpopulation, we propose to search for equivalence classes of descriptions with respect to their extension in the database rather than individual descriptions. These equivalence classes have unique maximal representatives forming a closure system. We show that minimum cardinality representatives of each equivalence class can be found during the enumeration process of that closure system without additional cost, while finding a minimum representative of a single equivalence class is NP-hard. With several real-world datasets we demonstrate that search space and output are significantly reduced by considering equivalence classes instead of individual descriptions and that the minimum representatives constitute a family of subgroup descriptions that is of same or better expressive power than those generated by traditional methods.
UR - http://www.scopus.com/inward/record.url?scp=70350633055&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-04180-8_29
DO - 10.1007/978-3-642-04180-8_29
M3 - Conference contribution
AN - SCOPUS:70350633055
SN - 3642041795
SN - 9783642041792
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 179
EP - 194
BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2009, Proceedings
T2 - European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2009
Y2 - 7 September 2009 through 11 September 2009
ER -