Hi,
I've spent the better part of the day trying to find answers on this, and haven't had much luck.
I have two systems that talk to eachother via stunnel. In the last week or so, I have noticed a number of errors in the application using the tunnel (messaging application), requiring multiple re-sends, etc.
This is my setup.
Both server and client run CentOS 6.2, with the RPM version of stunnel. Though it isn't the latest version, I believe RedHat do release any critical bugs in their updated RPMs. I did also go through the ChangeLog but couldn't find anything implemented since 4.29 that seemed to deal with what I'm seeing.
stunnel 4.29 on x86_64-unknown-linux-gnu with OpenSSL 1.0.0-fips 29 Mar 2010 Threading:PTHREAD SSL:ENGINE Sockets:POLL,IPv6 Auth:LIBWRAP
Global options debug = 5 pid = /var/run/stunnel.pid RNDbytes = 64 RNDfile = /dev/urandom RNDoverwrite = yes
Service-level options cert = /etc/stunnel/stunnel.pem ciphers = ALL:!aNULL:!eNULL:!SSLv2 key = /etc/stunnel/stunnel.pem session = 300 seconds stack = 65536 bytes sslVersion = SSLv3 for client, all for server TIMEOUTbusy = 300 seconds TIMEOUTclose = 60 seconds TIMEOUTconnect = 10 seconds TIMEOUTidle = 43200 seconds verify = none
Socket option defaults: Option Accept Local Remote OS default SO_DEBUG -- -- -- 0 SO_DONTROUTE -- -- -- 0 SO_KEEPALIVE -- -- -- 0 SO_LINGER -- -- -- 0:0 SO_OOBINLINE -- -- -- 0 SO_RCVBUF -- -- -- 87380 SO_SNDBUF -- -- -- 16384 SO_RCVLOWAT -- -- -- 1 SO_SNDLOWAT -- -- -- 1 SO_RCVTIMEO -- -- -- 0:0 SO_SNDTIMEO -- -- -- 0:0 SO_REUSEADDR 1 -- -- 0 SO_BINDTODEVICE -- -- -- -- TCP_KEEPCNT -- -- -- 9 TCP_KEEPIDLE -- -- -- 7200 TCP_KEEPINTVL -- -- -- 75 IP_TOS -- -- -- 0 IP_TTL -- -- -- 64 TCP_NODELAY -- -- -- 0
Common config (basically default file released by RedHat, with certs and ports configured) :
---------------- ; Protocol version (all, SSLv2, SSLv3, TLSv1) sslVersion = SSLv3
; Some security enhancements for UNIX systems - comment them out on Win32 chroot = /var/run/stunnel/ setuid = stunnel setgid = stunnel ; PID is created inside the chroot jail pid = /stunnel.pid
; Some debugging stuff useful for troubleshooting debug = 7 output = /var/log/stunnel.log -------------------
When the connection breaks, the logs (debug=7) on both sides are:
Client:
2012.03.21 19:58:14 LOG7[30685:140185543952128]: SSL state (connect): before/connect initialization 2012.03.21 19:58:14 LOG7[30685:140185543952128]: SSL state (connect): SSLv3 write client hello A 2012.03.21 19:58:14 LOG3[30685:140185543952128]: SSL_connect: Peer suddenly disconnected 2012.03.21 19:58:14 LOG5[30685:140185543952128]: Connection reset: 0 bytes sent to SSL, 0 bytes sent to socket
Server:
2012.03.21 19:58:14 LOG3[22230:140462283343616]: SSL_accept: Peer suddenly disconnected 2012.03.21 19:58:14 LOG5[22230:140462283343616]: Connection reset: 0 bytes sent to SSL, 0 bytes sent to socket
As I mentioned, it happens intermittently, with probably 50% of the connections working just fine, and the rest being disconnected. It ALWAYS seems to happen just after the client 'write client hello A', as opposed to later in the SSL handshake.
I ran a tcpdump on both sides, it is below. Note that both the client and the server are NATd behind firewalls, on the server the port stunnel listens on (31112) is opened through the firewall.
CLIENT (192.168.22.120, NATd externally as 66.66.66.66):
18:21:57.009887 IP 192.168.22.120.55747 > 77.77.77.77.31112: Flags [S], seq 2556598106, win 14600, options [mss 1460,sackOK,TS val 88488311 ecr 0,nop,wscale 7], length 0 18:21:57.076130 IP 77.77.77.77.31112 > 192.168.22.120.55747: Flags [S.], seq 351052925, ack 2556598107, win 14480, options [mss 1460,sackOK,TS val 646044220 ecr 88488311,nop,wscale 7], length 0 18:21:57.076195 IP 192.168.22.120.55747 > 77.77.77.77.31112: Flags [.], ack 1, win 115, options [nop,nop,TS val 88488377 ecr 646044220], length 0 18:21:57.077234 IP 192.168.22.120.55747 > 77.77.77.77.31112: Flags [P.], seq 1:140, ack 1, win 115, options [nop,nop,TS val 88488378 ecr 646044220], length 139 18:21:57.143582 IP 77.77.77.77.31112 > 192.168.22.120.55747: Flags [.], ack 140, win 122, options [nop,nop,TS val 646044288 ecr 88488378], length 0 18:21:57.143982 IP 77.77.77.77.31112 > 192.168.22.120.55747: Flags [RP.], seq 1, ack 140, win 122, length 0
SERVER (10.65.0.130, NATd externally as 77.77.77.77):
18:21:57.042122 IP 66.66.66.66.7760 > 10.65.0.130.http: Flags [S], seq 4074124673, win 14600, options [mss 1460,sackOK,TS val 88488311 ecr 0,nop,wscale 7], length 0 18:21:57.042161 IP 10.65.0.130.http > 66.66.66.66.7760: Flags [S.], seq 458906148, ack 4074124674, win 14480, options [mss 1460,sackOK,TS val 646044220 ecr 88488311,nop,wscale 7], length 0 18:21:57.108325 IP 66.66.66.66.7760 > 10.65.0.130.http: Flags [.], ack 1, win 115, options [nop,nop,TS val 88488377 ecr 646044220], length 0 18:21:57.109507 IP 66.66.66.66.7760 > 10.65.0.130.http: Flags [P.], seq 1:140, ack 1, win 115, options [nop,nop,TS val 88488378 ecr 646044220], length 139 18:21:57.109532 IP 10.65.0.130.http > 66.66.66.66.7760: Flags [.], ack 140, win 122, options [nop,nop,TS val 646044288 ecr 88488378], length 0 18:21:57.110092 IP 10.65.0.130.http > 66.66.66.66.7760: Flags [P.], seq 1:178, ack 140, win 122, options [nop,nop,TS val 646044288 ecr 88488378], length 177 18:21:57.175518 IP 66.66.66.66.7760 > 10.65.0.130.http: Flags [R.], seq 140, ack 1, win 115, length 0
Based on what I am seeing, a mysterious RST packet gets received by both sides, which causes them to terminate the session. Neither side originates any RST packets. When the connection is successful, the tcpdump looks a lot more normal, with an appropriate FIN exchange closing the session once the data has been sent.
My questions are - am I missing something obvious? Am I correct in reading the above as some firewall or router between the two servers breaking the TCP connection by sending TCP RSTs to both endpoints?
any help is appreciated. jordan