Enforcing k-anonymity in web mail auditing

Dotan Di Castro, Liane Lewin-Eytan, Yoelle Maarek, Ran Wolff, Eyal Zohar

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We study the problem of k-anonymization of mail messages in the realistic scenario of auditing mail traffic in a major commercial Web mail service. Mail auditing is necessary in various Web mail debugging and quality assurance activities, such as anti-spam or the qualitative evaluation of novel mail features. It is conducted by trained professionals, often referred to as "auditors", who are shown messages that could expose personally identifiable information. We address here the challenge of k-anonymizing such messages, focusing on machine generated mail messages that represent more than 90% of today's mail traffic. We introduce a novel message signature Mail-Hash, specifically tailored to identifying structurally-similar messages, which allows us to put such messages in a same equivalence class. We then define a process that generates, for each class, masked mail samples that can be shown to auditors, while guaranteeing the k-anonymity of users. The productivity of auditors is measured by the amount of non-hidden mail content they can see every day, while considering normal working conditions, which set a limit to the number of mail samples they can review. In addition, we consider k-anonymity over time since, by definition of k-anonymity, every new release places additional constraints on the assignment of samples. We describe in details the results we obtained over actual Yahoo mail traffic, and thus demonstrate that our methods are feasible at Web mail scale. Given the constantly growing concern of users over their email being scanned by others, we argue that it is critical to devise such algorithms that guarantee k-anonymity, and implement associated processes in order to restore the trust of mail users.

Original languageEnglish
Title of host publicationWSDM 2016 - Proceedings of the 9th ACM International Conference on Web Search and Data Mining
PublisherAssociation for Computing Machinery, Inc
Pages327-336
Number of pages10
ISBN (Electronic)9781450337168
DOIs
StatePublished - 8 Feb 2016
Externally publishedYes
Event9th ACM International Conference on Web Search and Data Mining, WSDM 2016 - San Francisco, United States
Duration: 22 Feb 201625 Feb 2016

Publication series

NameWSDM 2016 - Proceedings of the 9th ACM International Conference on Web Search and Data Mining

Conference

Conference9th ACM International Conference on Web Search and Data Mining, WSDM 2016
Country/TerritoryUnited States
CitySan Francisco
Period22/02/1625/02/16

Bibliographical note

Publisher Copyright:
© 2016 ACM.

ASJC Scopus subject areas

  • Computer Science Applications
  • Software
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Enforcing k-anonymity in web mail auditing'. Together they form a unique fingerprint.

Cite this