This paper describes an algorithm for accelerating the computations of Davies-Meyer based hash functions. It is based on parallelizing the computation of several message schedules for several message blocks of a given message. This parallelization, together with the proper use of vector processor instructions (SIMD) improves the overall algorithm's performance. Using this method, we obtain a new software implementation of SHA-256 that performs at 11. 47 Cycles/Byte on the second and 10. 18 Cycles/Byte (for an 8 KB message) on the third Generation Intel® CoreTM processors. We also show how to extend the method to the soon-to-come AVX2 architecture, which has wider registers. Since processors with AVX2 will be available only in 2013, exact performance reporting is not yet possible. Instead, we show that our resulting SHA-256 and SHA-512 implementations have a reduced number of instructions. Based on our findings, we make some observations on the SHA3 competition. We argue that if the prospective SHA3 standard is expected to be competitive against the performance of SHA-256 or SHA-512, on the high end platforms, then its performance should be well below 10 Cycles/Byte on the current, and certainly on the near future processors. Not all the SHA3 finalists have this performance. Furthermore, even the fastest finalists will probably offer only a small performance advantage over the current SHA-256 and SHA-512 implementations.
- Advanced vector extensions architectures
- SHA3 competition
- SIMD architecture
ASJC Scopus subject areas
- Computer Networks and Communications