The client was a 450 MHz Pentium III with 128MB RAM running Red Hat 7.0, using a vanilla 2.4.9 kernel. The server was a dual 650 MHz Pentium III running Red Hat 7.2 beta 2, using the 2.4.7-2-smp kernel that came with rh7.2b2. The client and server were connected using 100baseT through a Sohoware switch. tcpdump was run to verify that no DNS or Ident queries were generated by the ftp daemon. The systems were not running X, and
/etc/init.d/crond stop /etc/init.d/anacron stopwere run to prevent cron from starting expensive housekeeping tasks during the run.
The commands
ulimit -n 8192 echo 1024 32767 > /proc/sys/net/ipv4/ip_local_port_range echo 8192 > /proc/sys/fs/file-maxwere run on both client and server before starting.
A 10 kilobyte file was fetched repeatedly by many simulated users, using the command
time ./dkftpbench -hSERVER -n3000 -t180 -b1000 -f/pub/x10k.dat -sXwhere X was p for poll or r for realtime signals, and
Results | |||
---|---|---|---|
wu-ftpd | ncftpd | ||
Version | 2.6.1-18 | 2.7.0 | |
Options | -s -A | minusers 3000 | |
users | 730 | 3000 | |
client CPU | 56-85% | 83-90% | |
users | 741 | 3000 | |
client CPU | 3-7% | 28-40% |
For tests above 730 users, ncftpd was used instead of wu-ftpd. Oddly, it made dkftpbench use less CPU; maybe it wrote out the data in fewer packets.
At 2500 users, F_SETSIG's advantage started to wilt a bit, and CPU usage climbed to 15%-20% (compared to poll's 70%).
At 3000 users (as far as I was able to get ncftpd to go), with /proc/sys/kernel/rtsig-max at its default value of 1024, F_SETSIG's CPU usage climbed as high as 40% (compared to poll's 90%).
The program had to fall back to poll() several times per second. This is a bit suprising, since at any one time, there were only about 300 active fd's. It seems that when F_SETSIG is on, a signal is queued each time a packet is received -- even if that signal carries exactly the same information as the ten signals already in the queue.
I also tested with Luban's signal-per-fd patch (see my post on linux-kernel), which did reduce the number of redundant signals, and got rid of nearly all the SIGIO's, but did not seem to reduce CPU usage. Perhaps at higher numbers of users...
Version 0.7 of the dkftpbench benchmark started a number of simulated dialup users which logged in and then fetched a single binary 1 megabyte file (x1000k.dat) over and over without logging out; no directory listings were requested. If any user didn't get 75% of 28.8kbits/sec, it dropped out. The run continued until the number of users was stable for 3 minutes.
wu-ftpd was run standalone with argument -s.
Tuning suggestions for wu-ftpd were received but not used. Next time...
ncftpd was set with minusers 50 or 100, maxusers 300. The author kindly provided me with tuning notes, but I have not used them yet. Next time...
betaftpd was compiled with -g -O2 instead of -g, and the check for too many clients was fixed (should be just "is fd > 1023?").
proftpd-1.2.0pre10 was compiled from source for standalone without any special options, and MaxInstances was set to 500.
The commands
ulimit -n 4096 echo 1024 32767 > /proc/sys/net/ipv4/ip_local_port_range echo 4096 > /proc/sys/fs/file-maxwere run on both client and server before starting.
Here's a summary of the software used, and the results from dkftpbench:
wu-ftpd | proftpd | ncftpd | betaftpd | |
---|---|---|---|---|
Version | wu-2.5.0(1) | 1.2.0pre10 | 2.5.0/391 | 0.0.8pre11 |
Patches | gcc -O2, fixed max fd check | |||
Options | -s | |||
users | 49 | 82 | 194 | 751 |
load avg | 8 | 7 | 6 | 1.0 |
CPU | 90% | 100% | 50% | 100% |
users | 120 | 151 | 134 | 258 |
load avg | 50 | 60 | 5 | 0.25 |
CPU | 80%? | 80% | 25% | 20% |
Note that betaftpd was limited by the 1024 file descriptor limit of select in the run with 751 clients. (Oddly enough, you don't always need two sockets per client; during part of the transfer, the data socket is closed, and doesn't count against the 1024 fd per process limit.)
dkftpbench was compiled using the fastest options listed in the Makefile, and run with the commandline
bench -hp90 -fx100k.dat -n300 -t180except that the number of users was adjusted downwards if too many clients failed, and upwards if none failed.
The server was nearly unusable when wu-ftpd was running, and really bogged down when proftpd was running. ncftpd and betaftpd felt very light.
Version 0.4 of the dkftpbench benchmark started a number of simulated dialup users which logged in and then fetched a single binary 100kilobyte file (x100k.dat) over and over without logging out; no directory listings were requested. If any user didn't get 80% of 28.8kbits/sec, it dropped out. The run continued until the number of users was stable for 3 minutes.
The test severely taxed the 10baseT connection; it should be redone with full duplex 100baseT.
wu-ftpd was run in two ways: started from inetd untuned, and (after staring at the man page for a while) standalone with argument -s. (The -w and -Q options were also tried, but had little effect on this benchmark).
Tuning suggestions for wu-ftpd were received after the test, and might be used in a future test.
ncftpd was set with minusers 50, maxusers 300. The author kindly provided me with tuning notes, but I have not used them yet; I believe they will make more of a difference in larger runs or runs with smaller files, so I'll use them when I do a 100baseT test.
betaftpd was compiled with -O6 instead of -g, and a small bug in its PASV command was fixed (thanks to the author for the patch).
Here's a summary of the software used, and the results, given as (# of users started) ... (# of users left at end of run):
wu-ftpd | wu-ftpd | ncftpd | betaftpd | |
---|---|---|---|---|
Version | wu-2.5.0(1) | wu-2.5.0(1) | 2.5.0/391 | betaftpd 0.0.8pre10 |
Patches | gcc -O6, PASV fix | |||
Options | -s | |||
30 users | 30 | 30 | ||
50 users | 39 | 50 | 50 | 50 |
100 users | 100 | 100 | 98 | |
150 users | 115 | 150 | 145 | |
200 users | 155 | 192 | ||
300 users | 184 | 249 |
dkftpbench was compiled using the fastest options listed in the Makefile, and run with the commandline
bench -hp90 -fx100k.dat -n$n -t180 >> $n.log
Here are system parameters and various system limits measured by 'dklimits' and 'free' before the benchmark run. (ulimit -n was not changed for either client nor server, so only 1024 filehandles were available per process.) Note that since each of the 250 users took about 30 seconds for each fetch, the test only needed about 5 new ports per second, so the limit of 3900 local ports on the client was not an issue.
Server | Client | |
---|---|---|
CPU | 90 MHz Pentium | 450 MHz Pentium III |
OS | Red Hat 6.0 | Red Hat 6.1 |
Total mem | 30756KB | 128012KB |
Free mem | 8408KB | 76616KB |
/proc/sys/net/ipv4/ip_local_port_range | 1024 32767 | 1024 4999 |
/proc/sys/fs/file-max | 4096 | 4096 |
/proc/sys/fs/inode-max | 4096 | 16384 |
available fd's | 1021 | 1021 |
" explicit ports | 60413 | 60355 |
" ephemeral ports | 1021 | 1021 |
" nonblocking connect()'s | 1021 | 1021 |
poll() limit | 2048 | 2048 |
The actual conditions varied slightly from the above description, but not enough to affect results (e.g. a hub was used initially, was swapped out for a crossover cable when collisions were noticed, but the results changed by less than 1%).