Approximating the number of frequent sets in dense data

Mario Boley, Henrik Grosskreutz

Research output: Contribution to journalArticlepeer-review

Abstract

We investigate the problem of counting the number of frequent (item)sets - a problem known to be intractable in terms of an exact polynomial time computation. In this paper, we show that it is in general also hard to approximate. Subsequently, a randomized counting algorithm is developed using the Markov chain Monte Carlo method. While for general inputs an exponential running time is needed in order to guarantee a certain approximation bound, we show that the algorithm still has the desired accuracy on several real-world datasets when its running time is capped polynomially.

Original languageEnglish
Pages (from-to)65-89
Number of pages25
JournalKnowledge and Information Systems
Volume21
Issue number1
DOIs
StatePublished - 2009
Externally publishedYes

Keywords

  • Approximate counting
  • Data mining
  • Frequent itemsets
  • Markov chain Monte Carlo

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Human-Computer Interaction
  • Hardware and Architecture
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Approximating the number of frequent sets in dense data'. Together they form a unique fingerprint.

Cite this