Deterministic coresets for stochastic matrices with applications to scalable sparse pagerank

Harry Lang, Cenk Baykal, Najib Abu Samra, Tony Tannous, Dan Feldman, Daniela Rus

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The PageRank algorithm is used by search engines to rank websites in their search results. The algorithm outputs a probability distribution that a person randomly clicking on links will arrive at any particular page. Intuitively, a node in the center of the network should be visited with high probability even if it has few edges, and an isolated node that has many (local) neighbours will be visited with low probability. The idea of PageRank is to rank nodes according to a stable state and not according to the previous local measurement of inner/outer edges from a node that may be manipulated more easily than the corresponding entry in the stable state. In this paper we present a deterministic and completely parallelizable algorithm for computing an ε -approximation to the PageRank of a graph of n nodes. Typical inputs consist of millions of pages, but the average number of links per page is less than ten. Our algorithm takes advantage of this sparsity, assuming the out-degree of each node at most s, and terminates in O(ns/ε2) time. Beyond the input graph, which may be stored in read-only storage, our algorithm uses only O(n) memory. This is the first algorithm whose complexity takes advantage of sparsity. Real data exhibits an average out-degree of 7 while n is in the millions, so the advantage is immense. Moreover, our algorithm is simple and robust to floating point precision issues. Our sparse solution (core-set) is based on reducing the PageRank problem to an l2 approximation of the Carathéodory problem, which independently has many applications such as in machine learning and game theory. We hope that our approach will be useful for many other applications for learning sparse data and graphs. Algorithm, analysis, and open code with experimental results are provided.

Original languageEnglish
Title of host publicationTheory and Applications of Models of Computation - 15th Annual Conference, TAMC 2019, Proceedings
EditorsT. V. Gopal, Junzo Watada
PublisherSpringer Verlag
Pages410-423
Number of pages14
ISBN (Print)9783030148119
DOIs
StatePublished - 2019
Event15th Annual Conference on Theory and Applications of Models of Computation, TAMC 2019 - Kitakyushu, Japan
Duration: 13 Apr 201916 Apr 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11436 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference15th Annual Conference on Theory and Applications of Models of Computation, TAMC 2019
Country/TerritoryJapan
CityKitakyushu
Period13/04/1916/04/19

Bibliographical note

Funding Information:
In this paper we present a deterministic and completely parallelizable algorithm for computing an ε-approximation to the PageRank of a graph of n nodes. Typical inputs consist of millions of pages, but the average number of links per page is less than ten. Our algorithm takes advantage of this sparsity, assuming the out-degree of each node at most s, and terminates in O(ns/ε2) time. Beyond the input graph, which may be stored in read-only storage, our algorithm uses only O(n) memory. This is the first algorithm whose complexity takes advantage of sparsity. Real data exhibits an average out-degree of 7 while n is in the millions, so the advantage is immense. Moreover, our algorithm is simple and robust to floating point precision issues. Our sparse solution (core-set) is based on reducing the PageRank problem to an ℓ2 approximation of the Carathéodory problem, which independently has many applications Lang, Baykal, and Rus thank NSF 1723943, NSF 1526815, and The Boeing Company. This research was supported by Grant No. 2014627 from the United States-Israel Binational Science Foundation (BSF) and by Grant No. 1526815 from the United States National Science Foundation (NSF). Dan Feldman is grateful for the support of the Simons Foundation for part of this work that was done while he was visiting the Simons Institute for the Theory of Computing. H. Lang and C. Baykal—contributed equally to this work.

Funding Information:
Lang, Baykal, and Rus thank NSF 1723943, NSF 1526815, and The Boeing Company. This research was supported by Grant No. 2014627 from the United States-Israel Binational Science Foundation (BSF) and by Grant No. 1526815 from the United States National Science Foundation (NSF). Dan Feldman is grateful for the support of the Simons Foundation for part of this work that was done while he was visiting the Simons Institute for the Theory of Computing

Publisher Copyright:
© Springer Nature Switzerland AG 2019.

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Deterministic coresets for stochastic matrices with applications to scalable sparse pagerank'. Together they form a unique fingerprint.

Cite this