Speeding up CRC32C computations with Intel CRC32 instruction

Research output: Contribution to journalArticlepeer-review


Intel has recently introduced a new instruction, namely CRC32, to address a computational bottleneck in protocols such as ISCSI and RDMA that use CRC32C for data integrity checks. This instruction is designed to accumulate the CRC32C value of a buffer of arbitrary length, by a sequence of invocations that consume consecutive chunks of 8 bytes of the buffer per invocation. This instruction has latency of 3 cycles, and therefore using it serially allows software to process data at the rate of ∼2.67 bytes per cycle. We introduce here an alternative algorithm for computing the CRC32C value of a buffer, using the same instruction. This algorithm converts the latency bounded computations to throughput oriented ones, and maximizes the utilization of the pipelined hardware that underlies the instruction, achieving speedup of a factor of almost 3.

Original languageEnglish
Pages (from-to)179-185
Number of pages7
JournalInformation Processing Letters
Issue number5
StatePublished - 28 Feb 2012


  • Algorithms
  • CRC
  • Software design and implementation

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Signal Processing
  • Information Systems
  • Computer Science Applications


Dive into the research topics of 'Speeding up CRC32C computations with Intel CRC32 instruction'. Together they form a unique fingerprint.

Cite this