Abstract
The empirical entropy is a key statistical measure of data frequency vectors, enabling one to estimate how diverse the data are. From the computational point of view, it is important to quickly compute, approximate, or bound the entropy. In a distributed system, the representative (“global”) frequency vector is the average of the “local” frequency vectors, each residing in a distinct node. Typically, the trivial solution of aggregating the local vectors and computing their average incurs a huge communication overhead. Hence, the challenge is to approximate, or bound, the entropy of the global vector, while reducing communication overhead. In this paper, we develop algorithms which achieve this goal.
Original language | English |
---|---|
Article number | 1611 |
Journal | Entropy |
Volume | 24 |
Issue number | 11 |
DOIs | |
State | Published - 5 Nov 2022 |
Bibliographical note
Publisher Copyright:© 2022 by the authors.
Keywords
- distributed systems
- entropy
- entropy approximation
- entropy bounds
- sketches
ASJC Scopus subject areas
- Information Systems
- Mathematical Physics
- Physics and Astronomy (miscellaneous)
- Electrical and Electronic Engineering