Benchmarking rustls 0.23.15 vs OpenSSL 3.3.2 vs BoringSSL on ARM64
2024-10-31
System configuration
We ran the benchmarks on a bare-metal server with the following characteristics:
- OS: Debian 12 (Bookworm).
- C/C++ toolchains: GCC 12.2.0 and Clang 14.0.6.
- CPU: Ampere Altra Q80-30, 80 cores.
- Memory: 128GB.
- All cores set to
performance
CPU frequency governor.
This is the Hetzner RX170.
Versions
The benchmarking tool used for both OpenSSL and BoringSSL was openssl-bench d5de57d9.
This was built from source with its makefile.
BoringSSL
The tested version of BoringSSL is 76968bb3d5, which was the most recent point on master when we started the previous measurements.
BoringSSL was built from source with CC=clang CXX=clang++ cmake -DCMAKE_BUILD_TYPE=Release
.
clang is used here to avoid potential performance deficits to GCC
and for consistency with the x86 results.
OpenSSL
The tested version of OpenSSL is 3.3.2, which was the latest release at the time of writing.
OpenSSL was built from source with ./Configure ; make -j12
.
Rustls
The tested version of rustls was 0.23.15, which was the latest release at the time of writing. This was used with aws-lc-rs 1.10.0 / aws-lc-sys 0.22.0.
Additionally the following two commits were included, which affect the benchmark tool but do not affect the core crate:
Measurements
BoringSSL was tested with this command:
~/bench/openssl-bench
$ BENCH_MULTIPLIER=16 setarch -R make measure BORINGSSL=1
OpenSSL was tested with this command:
~/bench/openssl-bench
$ BENCH_MULTIPLIER=16 setarch -R make measure
rustls was tested with this command:
~/bench/rustls
$ BENCH_MULTIPLIER=16 setarch -R make -f admin/bench-measure.mk measure
Results
Transfer measurements are in megabytes per second. Handshake units are handshakes per second.
BoringSSL 76968bb3 | OpenSSL 3.3.2 | rustls 0.23.15 | |
---|---|---|---|
transfer, 1.2, aes-128-gcm, sending | 2211.53 | 2101.23 | 2077.19 |
transfer, 1.2, aes-128-gcm, receiving | 2250.93 | 2344.94 | 2173.4 |
transfer, 1.3, aes-256-gcm, sending | 1886.17 | 1741.07 | 1809.8 |
transfer, 1.3, aes-256-gcm, receiving | 1899.72 | 1953.49 | 1935.8 |
BoringSSL 76968bb3 | OpenSSL 3.3.2 | rustls 0.23.15 | |
full handshakes, 1.2, rsa, client | 1968.07 | 1588.54 | 4498.42 |
full handshakes, 1.2, rsa, server | 334.077 | 319.886 | 614.27 |
full handshakes, 1.2, ecdsa, client | 1527.73 | 1118.56 | 2154.06 |
full handshakes, 1.2, ecdsa, server | 3636.48 | 2950.54 | 8303.67 |
full handshakes, 1.3, rsa, client | 1861.15 | 1441.86 | 3986.81 |
full handshakes, 1.3, rsa, server | 330.484 | 312.446 | 599.39 |
full handshakes, 1.3, ecdsa, client | 1459.64 | 1045.98 | 2032.11 |
full handshakes, 1.3, ecdsa, server | 3252.58 | 2440.25 | 6212.45 |
BoringSSL 76968bb3 | OpenSSL 3.3.2 | rustls 0.23.15 | |
resumed handshakes, 1.2, client | 45452.2 | 18396.5 | 65267.61 |
resumed handshakes, 1.2, server | 43356.5 | 20426.4 | 65313.22 |
resumed handshakes, 1.3, client | 3969.88 | 3282.14 | 8443.11 |
resumed handshakes, 1.3, server | 3791.21 | 3071.61 | 7841.35 |
Observations on results
rustls trails a little in throughput tests. The three underlying cryptography libraries (BoringSSL, aws-lc, OpenSSL) have their own benchmarking tools which confirm that there is little variance between them:
~/bench/boringssl
$ LD_LIBRARY_PATH=. ./tool/bssl speed -filter AES-256-GCM
(...)
Did 139000 AES-256-GCM (16384 bytes) seal operations in 1004138us (138427.2 ops/sec): 2268.0 MB/s
(...)
~/bench/aws-lc
$ LD_LIBRARY_PATH=. ./tool/bssl speed -filter AES-256-GCM
(...)
Did 139000 EVP-AES-256-GCM encrypt (16384 bytes) operations in 1004522us (138374.3 ops/sec): 2267.1 MB/s
(...)
~/bench/openssl
$ LD_LIBRARY_PATH=. ./apps/openssl speed -aead -evp aes-256-gcm
(...)
Doing AES-256-GCM ops for 3s on 16384 size blocks: 434715 AES-256-GCM ops in 3.00s
(...)
The 'numbers' are in 1000s of bytes per second processed.
type 2 bytes 31 bytes 136 bytes 1024 bytes 8192 bytes 16384 bytes
AES-256-GCM 13570.29k 168865.38k 623050.41k 1766296.58k 2320760.83k 2374123.52k
That is 2268, 2267 and 2264 MB/s for BoringSSL, aws-lc and OpenSSL respectively. Given these project's shared lineage, it would not be surprising if the implementations are the same.