Abstract
We investigate the problem of counting the number of frequent (item)sets - a problem known to be intractable in terms of an exact polynomial time computation. In this paper, we show that it is in general also hard to approximate. Subsequently, a randomized counting algorithm is developed using the Markov chain Monte Carlo method. While for general inputs an exponential running time is needed in order to guarantee a certain approximation bound, we show that the algorithm still has the desired accuracy on several real-world datasets when its running time is capped polynomially.
Original language | English |
---|---|
Pages (from-to) | 65-89 |
Number of pages | 25 |
Journal | Knowledge and Information Systems |
Volume | 21 |
Issue number | 1 |
DOIs | |
State | Published - 2009 |
Externally published | Yes |
Keywords
- Approximate counting
- Data mining
- Frequent itemsets
- Markov chain Monte Carlo
ASJC Scopus subject areas
- Software
- Information Systems
- Human-Computer Interaction
- Hardware and Architecture
- Artificial Intelligence