Hi,

 

We’ve been running a rsync service over stunnel for some time now.  After upgrading our environment, which involved moving from an old version of Linux to FreeBSD 9.3, and from stunnel-4.29 to stunnel-5.06, we’ve had a problem pop up.

 

Ever since switching over, we see that rsync daemon processes slowly build up over time on our server, each consuming 100% CPU.  The vast majority of our clients are working just fine, and the problematic rsync processes just seem to have been dead/aborted connections from the clients (which are using wireless aircard, so connectivity can be shotty). 

 

A quick ‘truss’ on the process shows they are doing nothing but sitting there and read()’ing in a loop indefinitely:

# truss -p 14689 | head

read(6,0x7fffffff8240,1)                                                = 0 (0x0)

read(6,0x7fffffff8240,1)                                                = 0 (0x0)

read(6,0x7fffffff8240,1)                                                = 0 (0x0)

read(6,0x7fffffff8240,1)                                                = 0 (0x0)

read(6,0x7fffffff8240,1)                                                = 0 (0x0)

read(6,0x7fffffff8240,1)                                                = 0 (0x0)

 

Furthermore, the connection appears to be closed:

# sockstat -4 | grep 14689

root     rsync      14689 3  tcp4   127.0.0.1:873     127.0.0.1:27724

netstat -an | grep 27724

netstat: kvm not available: /dev/mem: No such file or directory

tcp4       0      0 127.0.0.1.873      127.0.0.1.27724    CLOSED

 

Looking through the stunnel logs, it looks like no data was ever really passed (perhaps the client gave up?):

2014.10.22 10:45:52 LOG7[34384941056]: Service [rsync] started                                                                         

2014.10.22 10:45:52 LOG5[34384941056]: Service [rsync] accepted connection from 1.1.1.1:63095

2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): before/accept initialization

2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 read client hello A

2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 write server hello A

2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 write certificate A

2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 write server done A

2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 flush data

2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 read client key exchange A

2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 read finished A

2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 write change cipher spec A

2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 write finished A

2014.10.22 10:45:52 LOG7[34384941056]: SSL state (accept): SSLv3 flush data

2014.10.22 10:45:52 LOG7[34384941056]:   41 items in the session cache

2014.10.22 10:45:52 LOG7[34384941056]:    0 client connects (SSL_connect())

2014.10.22 10:45:52 LOG7[34384941056]:    0 client connects that finished

2014.10.22 10:45:52 LOG7[34384941056]:    0 client renegotiations requested

2014.10.22 10:45:52 LOG7[34384941056]:  110 server connects (SSL_accept())

2014.10.22 10:45:52 LOG7[34384941056]:  107 server connects that finished

2014.10.22 10:45:52 LOG7[34384941056]:    0 server renegotiations requested

2014.10.22 10:45:52 LOG7[34384941056]:   66 session cache hits

2014.10.22 10:45:52 LOG7[34384941056]:    0 external session cache hits

2014.10.22 10:45:52 LOG7[34384941056]:   32 session cache misses

2014.10.22 10:45:52 LOG7[34384941056]:    0 session cache timeouts

2014.10.22 10:45:52 LOG6[34384941056]: No peer certificate received

2014.10.22 10:45:52 LOG6[34384941056]: SSL accepted: new session negotiated

2014.10.22 10:45:52 LOG6[34384941056]: Negotiated SSLv3 ciphersuite AES256-SHA (256-bit encryption)

2014.10.22 10:45:52 LOG6[34384941056]: Compression: null, expansion: null

2014.10.22 10:45:52 LOG6[34384941056]: s_connect: connecting 127.0.0.1:873

2014.10.22 10:45:52 LOG5[34384941056]: s_connect: connected 127.0.0.1:873

2014.10.22 10:45:52 LOG5[34384941056]: Service [rsync] connected remote server from 127.0.0.1:27724

2014.10.22 10:45:52 LOG7[34384941056]: Remote socket (FD=11) initialized

2014.10.22 11:01:12 LOG3[34384941056]: SSL_read: Connection reset by peer (54)

2014.10.22 11:01:12 LOG5[34384941056]: Connection reset: 56 byte(s) sent to SSL, 56 byte(s) sent to socket

2014.10.22 11:01:12 LOG7[34384941056]: Remote socket (FD=11) closed

2014.10.22 11:01:12 LOG7[34384941056]: Local socket (FD=10) closed

2014.10.22 11:01:12 LOG7[34384941056]: Service [rsync] finished (4 left)

 

The clients are generally field units running stunnel versions in the 4.x range.  Unfortunately, updating them is not an easy task and is not really under my control.  Either way, this seems like a server side issue, and possibly an issue with rsync (we’re running 3.1.0).

 

Has anyone run into anything like this before?  Does anyone have any suggestions for additional debugging we can do to try to pinpoint the problem?  We are going to try rolling back to stunnel-4.29 to see if the problem still occurs there on the new system.

 

Thanks,

 

Steve