[stunnel-users] Re: [PATCH 0/3] Improve stunnel performance by ~17%

11 Jun 2025

      On Tue, Jun 10, 2025 at 10:21 PM Michał Trojnara via stunnel-users
[email protected] wrote:
...
Hi Guys,
For the stunnel project, my priorities are:

Security
Reliability
Scalability
Performance

After approximately 27 years of active development, I believe stunnel
performs well in all of these areas. However, I would never sacrifice
security to improve reliability, never sacrifice reliability to improve
scalability, and would be unlikely to sacrifice scalability to improve
performance.
Security and reliability are hopefully self-explanatory. By
scalability, I mean the ability to efficiently handle as many
concurrent connections as possible on a given hardware platform—from
resource-constrained IoT devices to high-end servers. This involves
making effective use of available file descriptors and RAM, while
minimizing thread synchronization overhead.
There are two primary sources of performance bottlenecks:

Connection rate – The number of new connections established per

second. Establishing a new connection involves asymmetric cryptography,
which is computationally expensive and therefore typically the main
performance bottleneck.
2. Connection throughput – The data rate of a single connection,
which—at least in user space—is handled by a single CPU thread. This is
almost always faster than the network interface’s throughput and thus
rarely a limiting factor.
Hi,
Thank you for the quick response, and for stating the goals so clearly.
I think my use-case fits within 'Performance bottlenecks on connection
throughput on high-end servers':
The reason I started looking into stunnel performance is because it is
using 96-100% CPU and is a bottleneck for the application
I'm working on, even on relatively slower (by today's standards)
10Gbit/s networks.
On faster networks with 25Gbit/s, 40Gbit/s or 100Gbit/s you can see
how this limitation is even more serious.
The bottleneck isn't in encryption/decryption, but in all the
"overhead" around it (handling buffers, allocating/deallocating memory
too often, making system calls too often, etc.),
which was surprising, I would've expected encryption to take most of the time.
But in some sense this is actually good news, because all that
"overhead" can be reduced/improved, reducing encryption/decryption
time wouldn't really be possible
(it is already using AES-NI CPU instructions).
I probably should've started my emails by explaining that.
Probably more of a problem with servers and datacenters than end-users
(whom would rarely have an internet connection
faster than 1Gbit/s), but I'm glad that high-end servers are also on
your goals list.
...
The current OpenSSL performance settings were selected because they
have been extensively tested by numerous users across many OpenSSL
versions and network stacks. Any change would require a compelling
reason.
The use case is live migrating VMs, where the connection between 2
hosts is encrypted using stunnel.
Connection rate isn't very important for this use case, but the
connection throughput is.
In some cases there might be a tradeoff between the two (e.g. memory
usage vs performance), it'd be nice to have .conf flags to choose
between the two.
Although I think when stunnel runs in *client* mode throughput would
probably be more important than using a little bit more memory.
Of course migrating more than one VM at a time (and thus using more
than 1 HTTPS connection at a time) can work around this limitation
up to a point, but would be good to fix the performance bottlenecks
that are easily fixable by tweaking OpenSSL settings and buffer sizes.
So far my tweaks achieve an 17% performance improvement (I don't know
whether that satisfies the threshold for compelling), but it should be
possible to gain even more.
I'll know more once I finished the rest of my patches, but comparing
nginx vs stunnel as a server (with curl as a client) shows that nginx
can do 20Gbit/s (on a single stream), and
a patched stunnel can do ~18Gbit/s, so there is potentially more
performance to be gained by improving stunnel (also with a now patched
curl client I should be able to achieve more on both).
(I'd prefer to improve what we already have, i.e. stunnel. I only used
nginx as a performance reference, because I was looking what is the
fastest that I can achieve using OpenSSL on my hardware,
and nginx is currently the fastest among nginx, hitch, socat and
stunnel. I don't see a compelling reason to switch away from stunnel
though).
There are also other approaches that my application could've taken,
e.g. using Wireguard, but they seem considerably more risky
(wireguard is in the kernel, so if there is a bug then the whole host
can crash). I agree with your priorities that reliability is more
important than performance.
Although from a testing point of view I can try adding it to my
comparison, at least to have a "max achievable performance on this HW"
reference point to strive towards.
Best regards,
--Edwin
...
Best regards,
        Mike
hshh wrote:
...
Tested the patch, which will cause some applications to have issues
after connecting via Stunnel. For example, connecting to SSH via
Stunnel, the SSH client's display has a problem.
Sounds like similar issues that I've been having in curl (i.e. this
change may expose latent bugs,
either in OpenSSL, the application or stunnel).
I think the safest route would be to have this disabled by default,
with an stunnel.conf flag that applications
can enable if they know this'd work for their protocol/implementation.
Thank you,
--Edwin
...
...
On Tue, Jun 10, 2025 at 5:46 PM Edwin Torok via stunnel-users
[email protected] wrote:
...
Hello,
Flamegraph profiling on stunnel has shown that most time is NOT
spent
in encryption/decryption,
but in sending/receiving data.
Experimental setup:
CPU: 2x 18-core Intel Xeon Gold 6354
NIC: Intel E810-XXV, 100 Gbit/s
Kernel: 6.14.8-300.fc42.x86_64 x86_64
Mem: 251.32 GiB
OS: Fedora Linux 42
openssl version: OpenSSL 3.2.4 11 Feb 2025 (Library: OpenSSL 3.2.4
11 Feb 2025)
stunnel: 5.75 from git

Network, encryption and memory copy speeds converted into Gbit/s on
the above HW:
iperf3: 40Gbit/s
openssl speed -evp aes-256-gcm: 72Gbit/s
perf bench memcpy: 100Gbit/s

Under ideal conditions we should be able to achieve at most
~20.5Gbit/s,
however I only measured ~7.6Gbit/s.
The scripts that I used to perform measurements are included in the
attached patch 3.
See also related discussion on curl:
https://github.com/curl/curl/pull/17548
Enabling readahead, avoiding excessive alloc/free of buffers,
increasing buffer sizes,
and improving buffer handling can all improve performance.
For now I'm sending just the first small patches as proofs of
concept,
each of them one-liners.
I hope you can include these in some form in the next release.
Eventually these should probably be exposed as configuration flags
in
stunnel.conf,
I can help implementing that, or I can leave implementing that to
you
if you prefer.
In fact readahead was enabled in previous versions of stunnel, and
then disabled again,
so I assume you ran into some bugs with it on certain protocols?
Increasing the buffer size is not a clear win (as with curl), due
to
the excessive use of memmove/memset,
and that SSL_read/SSL_write only still processes 1 TLS record at a
time.
I have some further changes that can improve that, but the patches
are
larger and I haven't finished testing them for correctness yet.
Please let me know if you'd want these as patches, or if you'd
rather
implement them yourself.
Edwin Török (3):
  openssl: enable readahead
  openssl: disable SSL_MODE_RELEASE_BUFFERS
  Benchmark scripts
src/ctx.c                    |   5 +-
 tests/benchmark/benchmark.sh | 120 +++++++++++++++++++++++
 tests/benchmark/launch.sh    | 185
+++++++++++++++++++++++++++++++++++
 3 files changed, 309 insertions(+), 1 deletion(-)
 create mode 100644 tests/benchmark/benchmark.sh
 create mode 100755 tests/benchmark/launch.sh
--
2.43.5
_______________________________________________
stunnel-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

stunnel-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

stunnel-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

[stunnel-users] Re: [PATCH 0/3] Improve stunnel performance by ~17%