This paper deals with optimizations for big-numbers (multi-precision) squaring, and their efficient implementation on x86-64 platforms. Such optimizations have various usages, and a most prominent one is RSA acceleration, where big-numbers squaring consumes a significant portion of the computations. We introduce an algorithm for big-numbers squaring, that reduces the number of single precision add-with-carry operations, and trades several additions with a single left shift operation. When measured on the 2nd Generation Intel® Core™ processor, for 512-bit operands, our algorithm is roughly 1.4 times faster than the implementation of GMP library 5.0.2. For 1024-bit operands, our implementation is 1.2 times faster than that of the GMP library 5.0.2. Our optimization is used in a recently posted Open SSL patch  for accelerating modular exponentiation for RSA.