TY - GEN

T1 - Detecting mean changes in data streams

AU - Badarna, Murad

AU - Wolff, Ran

PY - 2011

Y1 - 2011

N2 - High speed data streams, ever more prevalent in our daily lives, are almost never stationary. In Web marketing applications, click-through data changes with the hour of the day and changes drastically on the weekend. The same phenomenon occurs in domains as diverse as traffic control, power grids, and stock trading. However, even the simplest change detection problem - the detection of changes in the mean of the distribution - is a multifaceted problem in which the number of false positives, the number of samples needed, the accuracy at which the change point is identified, and the computational resources needed, each have a cost and can all be traded against each other. We present a new mean change detection algorithm suitable for high speed data streams. The algorithm uses probabilistic bounds on the value to which a test statistics would converge in the long term to focus only on those points in the prefix of the stream at which a change might have occurred. We show that this selection limits the expected computational overhead per new sample to a constant, which is equivalent to that of the fastest known algorithms. On the other hand, we show that the detection accuracy, the detection delay, and the rate of false-positives of our new algorithm are all far better than those of those predecessors.

AB - High speed data streams, ever more prevalent in our daily lives, are almost never stationary. In Web marketing applications, click-through data changes with the hour of the day and changes drastically on the weekend. The same phenomenon occurs in domains as diverse as traffic control, power grids, and stock trading. However, even the simplest change detection problem - the detection of changes in the mean of the distribution - is a multifaceted problem in which the number of false positives, the number of samples needed, the accuracy at which the change point is identified, and the computational resources needed, each have a cost and can all be traded against each other. We present a new mean change detection algorithm suitable for high speed data streams. The algorithm uses probabilistic bounds on the value to which a test statistics would converge in the long term to focus only on those points in the prefix of the stream at which a change might have occurred. We show that this selection limits the expected computational overhead per new sample to a constant, which is equivalent to that of the fastest known algorithms. On the other hand, we show that the detection accuracy, the detection delay, and the rate of false-positives of our new algorithm are all far better than those of those predecessors.

KW - Data stream

KW - Detecting mean changes

KW - Twosample test

UR - http://www.scopus.com/inward/record.url?scp=84857174044&partnerID=8YFLogxK

U2 - 10.1109/ICDMW.2011.64

DO - 10.1109/ICDMW.2011.64

M3 - Conference contribution

AN - SCOPUS:84857174044

SN - 9780769544090

T3 - Proceedings - IEEE International Conference on Data Mining, ICDM

SP - 568

EP - 572

BT - Proceedings - 11th IEEE International Conference on Data Mining Workshops, ICDMW 2011

T2 - 11th IEEE International Conference on Data Mining Workshops, ICDMW 2011

Y2 - 11 December 2011 through 11 December 2011

ER -