Outsourcing Data Processing Jobs with Lithops

Josep Sampé, Marc Sánchez-Artigas, Gil Vernik, Ido Yehekzel, Pedro García-López

Research output: Contribution to journalArticlepeer-review

Abstract

Unexpectedly, the rise of serverless computing has also collaterally started the 'democratization' of massive-scale data parallelism. This new trend heralded by PyWren pursues to enable untrained users to execute single-machine code in the cloud at massive scale through platforms like AWS Lambda. Driven by this vision, this article presents Lithops, which carries forward the pioneering work of PyWren to better exploit the innate parallelism of à la MapReduce tasks atop several Functions-as-a-Service platforms such as AWS Lambda, IBM Cloud Functions, Google Cloud Functions or Knative. Instead of waiting for a cluster to be up and running in the cloud, Lithops makes easy the task of spawning hundreds and thousands of cloud functions to execute a large job in a few seconds from start. With Lithops, for instance, users can painlessly perform exploratory data analysis from within a Jupyter notebook, while it is the Lithops's engine which takes care of launching the parallel cloud functions, loading dependencies, automatically partitioning the data, etc. In this article, we describe the design and innovative features of Lithops and evaluate it using several representative applications, including sentiment analysis, Monte Carlo simulations, and hyperparameter tuning. These applications manifest the Lithops' ability to scale single-machine code computations to thousands of cores. And very importantly, without the need of booting a cold cluster or keeping a warm cluster for occasional tasks.

Original languageEnglish
Pages (from-to)1026-1037
Number of pages12
JournalIEEE Transactions on Cloud Computing
Volume11
Issue number1
DOIs
StatePublished - 1 Jan 2023
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2013 IEEE.

Keywords

  • IBM cloud
  • PyWren
  • Serverless computing
  • cloud computing
  • distributed systems
  • lithops
  • multi-cloud

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture
  • Computer Science Applications
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Outsourcing Data Processing Jobs with Lithops'. Together they form a unique fingerprint.

Cite this