The need for end-to-end security in the internet, constantly increases the world-wide number (and percentage) of SSL/TLS connections. As a result, the cryptographic algorithms that support such secure communications become a critical computational load for servers, and therefore an important target for optimization. We discuss here techniques for speeding up the software performance of several important cryptographic primitives on the ubiquitous x86_64 architectures that are used in most server platforms, and report new and improved results. A few examples are the following performance numbers, measured on the 2
processor: RSA1024/2048 implementation which is ~1.6x faster than the current OpenSSL version (1.0.0e), and SHA-1, SHA-256 and SHA-512 performing at, respectively, 5.75, 14, 9.71 cycles per byte.