FHENDI: A Near-DRAM Accelerator for Compiler-Generated Fully Homomorphic Encryption Applications

Yongmo Park, Aporva Amarnath, Subhankar Pal, Karthik Swaminathan, Alper Buyuktosunoglu, Hayim Shaul, Ehud Aharoni, Nir Drucker, Wei D. Lu, Omri Soceanu, Pradip Bose

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Fully homomorphic encryption (FHE) is a powerful cryptographic technique that enables computation on encrypted data without needing to decrypt it. It has broad applications in scenarios where sensitive data needs to be processed in the cloud or in other untrusted environments. FHE applications are both compute- and memory-intensive, owing to expensive operations on large data. While prior works address the challenges of efficient compute using dedicated hardware, expensive memory transfers still remain a major limiting factor. In this work, we propose a hierarchical near-DRAM processing (NDP) solution for FHE applications, called FHENDI, that harnesses the massive DRAM bank bandwidth. We observe various data access patterns in FHE that reveal distinct levels of parallelism: element-wise, limb-wise, coefficient-wise, and ciphertext-wise. FHENDI exploits these levels of parallelism to map FHE operations and data onto different hierarchies of our design, while addressing three major challenges with NDP for FHE: (i) the lack of bank-to-bank communication support, (ii) limited die-to-die bandwidth, and (iii) large memory access latencies. We resolve the first problem through a novel, conflict-free mapping algorithm built atop localized permutation networks that enables efficient element-wise and butterfly operations in FHE. The second problem is addressed by pipelining the execution of parallel bootstrap operations observed in compiled FHE workloads. Finally, we hide the memory access latency behind computation latency by exploiting a dual-banking scheme and subarray-level parallelism (SLP) of the DRAM banks. We evaluate FHENDI using representative workloads in the domains of privacy-preserving machine learning inference on CNNs and Transformers, database range query, and sorting, that are obtained using a compiler framework called HElayers. We compare FHENDI with a server-class CPU and GPU running the state-of-the-art HEaaN library, and an FHE accelerator ASIC, and report mean speedups of 2145.8 ×, 118.29 ×, and 2.45 ×, respectively.

Original languageEnglish
Title of host publicationProceedings - 2025 IEEE International Symposium on High Performance Computer Architecture, HPCA 2025
PublisherIEEE Computer Society
Pages1127-1142
Number of pages16
ISBN (Electronic)9798331506476
DOIs
StatePublished - 2025
Externally publishedYes
Event31st IEEE International Symposium on High Performance Computer Architecture, HPCA 2025 - Las Vegas, United States
Duration: 1 Mar 20255 Mar 2025

Publication series

NameProceedings - International Symposium on High-Performance Computer Architecture
ISSN (Print)1530-0897

Conference

Conference31st IEEE International Symposium on High Performance Computer Architecture, HPCA 2025
Country/TerritoryUnited States
CityLas Vegas
Period1/03/255/03/25

Bibliographical note

Publisher Copyright:
© 2025 IEEE.

Keywords

  • fully homomorphic encryption
  • hardware accelerator
  • high-bandwidth memory
  • near-dram processing
  • number theoretic transform

ASJC Scopus subject areas

  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'FHENDI: A Near-DRAM Accelerator for Compiler-Generated Fully Homomorphic Encryption Applications'. Together they form a unique fingerprint.

Cite this