Benchmarking rustls 0.23.15 vs OpenSSL 3.3.2 vs BoringSSL on x86_64
2024-10-18
System configuration
We ran the benchmarks on a bare-metal server with the following characteristics:
- OS: Debian 12 (Bookworm).
- C/C++ toolchains: GCC 12.2.0 and Clang 14.0.6.
- CPU: Xeon E-2386G (supporting AVX-512).
- Memory: 32GB.
- Extra configuration: hyper-threading disabled, dynamic frequency scaling disabled, cpu scaling governor set to performance for all cores.
Versions
The benchmarking tool used for both OpenSSL and BoringSSL was openssl-bench d5de57d9.
This was built from source with its makefile.
BoringSSL
The tested version of BoringSSL is 76968bb3d5, which was the most recent point on master when we started these measurements.
BoringSSL was built from source with CC=clang CXX=clang++ cmake -DCMAKE_BUILD_TYPE=Release
.
clang is used here to avoid potential performance deficits to GCC.
OpenSSL
The tested version of OpenSSL is 3.3.2, which was the latest release at the time of writing.
OpenSSL was built from source with ./Configure ; make -j12
.
Rustls
The tested version of rustls was 0.23.15, which was the latest release at the time of writing. This was used with aws-lc-rs 1.10.0 / aws-lc-sys 0.22.0.
Additionally the following two commits were included, which affect the benchmark tool but do not affect the core crate:
Measurements
BoringSSL was tested with this command:
~/bench/openssl-bench
$ BENCH_MULTIPLIER=16 setarch -R make measure BORINGSSL=1
OpenSSL was tested with this command:
~/bench/openssl-bench
$ BENCH_MULTIPLIER=16 setarch -R make measure
rustls was tested with this command:
~/bench/rustls
$ BENCH_MULTIPLIER=16 setarch -R make -f admin/bench-measure.mk measure
Results
Transfer measurements are in megabytes per second. Handshake units are handshakes per second.
BoringSSL 76968bb3 | OpenSSL 3.3.2 | rustls 0.23.15 | |
---|---|---|---|
transfer, 1.2, aes-128-gcm, sending | 5043.04 | 6560.79 | 8154.27 |
transfer, 1.2, aes-128-gcm, receiving | 4429.26 | 7192.17 | 7436.88 |
transfer, 1.3, aes-256-gcm, sending | 4332.5 | 5982.18 | 7094.13 |
transfer, 1.3, aes-256-gcm, receiving | 3872.34 | 6521.2 | 7278.15 |
BoringSSL 76968bb3 | OpenSSL 3.3.2 | rustls 0.23.15 | |
full handshakes, 1.2, rsa, client | 5470.01 | 3201.92 | 8227.61 |
full handshakes, 1.2, rsa, server | 1449.65 | 2159.59 | 2829.04 |
full handshakes, 1.2, ecdsa, client | 3451.51 | 2071.74 | 4369.39 |
full handshakes, 1.2, ecdsa, server | 9115.04 | 5196.8 | 12921.68 |
full handshakes, 1.3, rsa, client | 4813.91 | 2788.76 | 6803.93 |
full handshakes, 1.3, rsa, server | 1386.06 | 1913.38 | 2544.31 |
full handshakes, 1.3, ecdsa, client | 3177.49 | 1859.77 | 3937.7 |
full handshakes, 1.3, ecdsa, server | 7107.86 | 3938.47 | 8325.74 |
BoringSSL 76968bb3 | OpenSSL 3.3.2 | rustls 0.23.15 | |
resumed handshakes, 1.2, client | 45547.6 | 20703.8 | 64722.55 |
resumed handshakes, 1.2, server | 43985.3 | 22268.1 | 71149.91 |
resumed handshakes, 1.3, client | 9818.4 | 5328.6 | 10912.87 |
resumed handshakes, 1.3, server | 8600.76 | 4866.2 | 9500.11 |
Observations on results
AVX-512 support shows up twice in these results:
- rustls/aws-lc and OpenSSL's performance advantage in throughput tests is due to use of AVX-512F/VAES.
- rustls/aws-lc and OpenSSL's performance advantage in server-side full handshake tests is due to use of AVX-512IFMA-accelerated RSA.
This support was contributed to the respective projects by Intel.
TLS1.3 resumption is slower than TLS1.2 resumption because it includes a fresh key exchange.