dkftpbench results

This page lists preliminary results of informal performance comparisons of various ftp daemons, using dkftpbench as the workload and measuring tool.

14 Sept 2001

Version 0.30 of the dkftpbench benchmark allows choosing between poll() and realtime signals (a la F_SETSIG), so I ran a test to compare the performance of the two methods.

The client was a 450 MHz Pentium III with 128MB RAM running Red Hat 7.0, using a vanilla 2.4.9 kernel. The server was a dual 650 MHz Pentium III running Red Hat 7.2 beta 2, using the 2.4.7-2-smp kernel that came with rh7.2b2. The client and server were connected using 100baseT through a Sohoware switch. tcpdump was run to verify that no DNS or Ident queries were generated by the ftp daemon. The systems were not running X, and

/etc/init.d/crond stop
/etc/init.d/anacron stop
were run to prevent cron from starting expensive housekeeping tasks during the run.

The commands

ulimit -n 8192
echo 1024 32767 > /proc/sys/net/ipv4/ip_local_port_range
echo 8192 > /proc/sys/fs/file-max
were run on both client and server before starting.

A 10 kilobyte file was fetched repeatedly by many simulated users, using the command

time ./dkftpbench -hSERVER -n3000 -t180 -b1000 -f/pub/x10k.dat -sX
where X was p for poll or r for realtime signals, and
SERVER was the hostname of the machine running the ftp server.
The -b1000 option selects a client download speed of 1000 bytes / sec; clients that can't maintain that rate cause the test to fail.

Results
wu-ftpd ncftpd
Version 2.6.1-182.7.0
Options -s -Aminusers 3000
10KB file, 1kbyte/sec, poll()
users 730 3000
client CPU 56-85% 83-90%
10KB file, 1kbyte/sec, F_SETSIG, rtsig-max=1024
users 741 3000
client CPU 3-7%28-40%
At 730 users (that's all wu-ftpd could handle), F_SETSIG used only one tenth as much CPU time as poll() (7% vs. 70%) according to top.

For tests above 730 users, ncftpd was used instead of wu-ftpd. Oddly, it made dkftpbench use less CPU; maybe it wrote out the data in fewer packets.

At 2500 users, F_SETSIG's advantage started to wilt a bit, and CPU usage climbed to 15%-20% (compared to poll's 70%).

At 3000 users (as far as I was able to get ncftpd to go), with /proc/sys/kernel/rtsig-max at its default value of 1024, F_SETSIG's CPU usage climbed as high as 40% (compared to poll's 90%).

The program had to fall back to poll() several times per second. This is a bit suprising, since at any one time, there were only about 300 active fd's. It seems that when F_SETSIG is on, a signal is queued each time a packet is received -- even if that signal carries exactly the same information as the ten signals already in the queue.

I also tested with Luban's signal-per-fd patch (see my post on linux-kernel), which did reduce the number of redundant signals, and got rid of nearly all the SIGIO's, but did not seem to reduce CPU usage. Perhaps at higher numbers of users...

24-25 Jan 2000: wu-ftpd 120, proftpd 151, ncftpd 134, betaftpd 258

I ran a quick comparison of wu-ftpd, proftpd, ncftpd, and betaftpd on a 100baseT connection between a fast, big client and a slow, small server.
In all cases, commuication was done via 100baseT using a crossover cable (no hub, to avoid collisions). X was running on the client system. Very little else was active on either machine.

Version 0.7 of the dkftpbench benchmark started a number of simulated dialup users which logged in and then fetched a single binary 1 megabyte file (x1000k.dat) over and over without logging out; no directory listings were requested. If any user didn't get 75% of 28.8kbits/sec, it dropped out. The run continued until the number of users was stable for 3 minutes.

wu-ftpd was run standalone with argument -s.

Tuning suggestions for wu-ftpd were received but not used. Next time...

ncftpd was set with minusers 50 or 100, maxusers 300. The author kindly provided me with tuning notes, but I have not used them yet. Next time...

betaftpd was compiled with -g -O2 instead of -g, and the check for too many clients was fixed (should be just "is fd > 1023?").

proftpd-1.2.0pre10 was compiled from source for standalone without any special options, and MaxInstances was set to 500.

The commands

ulimit -n 4096
echo 1024 32767 > /proc/sys/net/ipv4/ip_local_port_range
echo 4096 > /proc/sys/fs/file-max
were run on both client and server before starting.

Here's a summary of the software used, and the results from dkftpbench:

wu-ftpd proftpd ncftpd betaftpd
Version wu-2.5.0(1)1.2.0pre102.5.0/391 0.0.8pre11
Patches gcc -O2, fixed max fd check
Options -s
10KB file
users 49 82 194 751
load avg8 7 6 1.0
CPU 90% 100%50% 100%
1000KB file
users 120 151 134 258
load avg50 60 5 0.25
CPU 80%?80% 25% 20%

Note that betaftpd was limited by the 1024 file descriptor limit of select in the run with 751 clients. (Oddly enough, you don't always need two sockets per client; during part of the transfer, the data socket is closed, and doesn't count against the 1024 fd per process limit.)

dkftpbench was compiled using the fastest options listed in the Makefile, and run with the commandline

bench -hp90 -fx100k.dat -n300 -t180
except that the number of users was adjusted downwards if too many clients failed, and upwards if none failed.

The server was nearly unusable when wu-ftpd was running, and really bogged down when proftpd was running. ncftpd and betaftpd felt very light.

13/14 Jan 2000: wu-ftpd 39 or 115, ncftpd 184, betaftpd 249

I ran a quick comparison of wu-ftpd, ncftpd, and betaftpd on a 10baseT connection between a fast, big client and a slow, small server.
In all cases, commuication was done via 10baseT using a crossover cable (no hub, to avoid collisions). X was not running on either system.

Version 0.4 of the dkftpbench benchmark started a number of simulated dialup users which logged in and then fetched a single binary 100kilobyte file (x100k.dat) over and over without logging out; no directory listings were requested. If any user didn't get 80% of 28.8kbits/sec, it dropped out. The run continued until the number of users was stable for 3 minutes.

The test severely taxed the 10baseT connection; it should be redone with full duplex 100baseT.

wu-ftpd was run in two ways: started from inetd untuned, and (after staring at the man page for a while) standalone with argument -s. (The -w and -Q options were also tried, but had little effect on this benchmark).

Tuning suggestions for wu-ftpd were received after the test, and might be used in a future test.

ncftpd was set with minusers 50, maxusers 300. The author kindly provided me with tuning notes, but I have not used them yet; I believe they will make more of a difference in larger runs or runs with smaller files, so I'll use them when I do a 100baseT test.

betaftpd was compiled with -O6 instead of -g, and a small bug in its PASV command was fixed (thanks to the author for the patch).

Here's a summary of the software used, and the results, given as (# of users started) ... (# of users left at end of run):
wu-ftpd wu-ftpd ncftpd betaftpd
Version wu-2.5.0(1)wu-2.5.0(1)2.5.0/391 betaftpd 0.0.8pre10
Patches gcc -O6, PASV fix
Options -s
30 users30 30
50 users39 50 50 50
100 users 100 100 98
150 users 115 150 145
200 users 155 192
300 users 184 249

dkftpbench was compiled using the fastest options listed in the Makefile, and run with the commandline

bench -hp90 -fx100k.dat -n$n -t180 >> $n.log

Here are system parameters and various system limits measured by 'dklimits' and 'free' before the benchmark run. (ulimit -n was not changed for either client nor server, so only 1024 filehandles were available per process.) Note that since each of the 250 users took about 30 seconds for each fetch, the test only needed about 5 new ports per second, so the limit of 3900 local ports on the client was not an issue.
Server Client
CPU 90 MHz Pentium450 MHz Pentium III
OS Red Hat 6.0Red Hat 6.1
Total mem 30756KB 128012KB
Free mem 8408KB 76616KB
/proc/sys/net/ipv4/ip_local_port_range1024 32767 1024 4999
/proc/sys/fs/file-max 4096 4096
/proc/sys/fs/inode-max 4096 16384
available fd's 1021 1021
" explicit ports 60413 60355
" ephemeral ports 1021 1021
" nonblocking connect()'s1021 1021
poll() limit 2048 2048

The actual conditions varied slightly from the above description, but not enough to affect results (e.g. a hub was used initially, was swapped out for a crossover cable when collisions were noticed, but the results changed by less than 1%).


Copyright 2000-2001, Dan Kegel
[Return to dkftpbench]