LDA classifier monitoring in distributed streaming systems

Ran Bernstein, Margarita Osadchy, Daniel Keren, Assaf Schuster

Research output: Contribution to journalArticlepeer-review

Abstract

An important problem in real systems for mining data streams is to detect changes in the dynamic model describing the temporal data. Such changes indicate that the underlying data has undergone a transition which may well require attention. A distributed setting poses one of the main challenges in this type of change detection. In a distributed setting, model training requires centralizing the data from all nodes (hereafter, synchronization), which is very costly in terms of communication. In order to minimize communication, a monitoring algorithm should be executed locally at each node, while preserving the validity of the global model (that is, the model that will be computed if a synchronization takes place). To achieve this goal, we propose the first communication-efficient algorithm for monitoring a classification model over distributed, dynamic data streams. While the approach is general, here we concentrate on Linear Discriminant Analysis (LDA), a popular method for classification and dimensionality reduction in many fields. We mainly apply tools from the realms of linear algebra and multi-variate analysis in order to solve the problem at hand. The resulting implementation is quite straightforward. The emphasis of this work is not on solving the distributed optimization problem that corresponds to finding a classifier over the distributed data; instead, we continuously monitor the current classifier to check that it still fits the data. In addition to the theoretical guarantee of the proposed monitoring algorithm, we demonstrate how it reduces communication volume by up to two orders of magnitude (compared to synchronization in every round) on synthetic data as well as three real datasets from different worlds of content. Our approach monitors the classification model itself as opposed to its misclassifications, which makes it possible to detect the change before misclassifications occur.

Original languageEnglish
Pages (from-to)156-167
Number of pages12
JournalJournal of Parallel and Distributed Computing
Volume123
DOIs
StatePublished - Jan 2019

Bibliographical note

Publisher Copyright:
© 2018 Elsevier Inc.

Keywords

  • Distributed monitoring
  • Linear discriminant analysis

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computer Networks and Communications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'LDA classifier monitoring in distributed streaming systems'. Together they form a unique fingerprint.

Cite this