This paper describes an algorithm for computing modular exponentiation using vector (SIMD) instructions. It demonstrates, for the first time, how such a software approach can outperform the classical scalar (ALU) implementations, on the high end x86-64 platforms, if they have a wide SIMD architecture. Here, we target speeding up RSA2048 on Intel's soon-to-arrive platforms that support the AVX2 instruction set. To this end, we applied our algorithm and generated an optimized AVX2-based software implementation of 1024-bit modular exponentiation. This implementation is seamlessly integrated into OpenSSL, by patching over OpenSSL 1.0.1. Our results show that our implementation requires 51% less instructions than the current OpenSSL 1.0.1 implementation. This illustrates the potential significant speedup in the RSA2048 performance, which is expected in the coming (2013) Intel processors. The impact of such speedup on servers is noticeable, especially since migration to RSA2048 is recommended by NIST, starting from 2013.