dkftpbench results

This page lists preliminary results of informal performance comparisons of various ftp daemons, using dkftpbench as the workload and measuring tool.

14 Sept 2001

Version 0.30 of the dkftpbench benchmark allows choosing between poll() and realtime signals (a la F_SETSIG), so I ran a test to compare the performance of the two methods.

The client was a 450 MHz Pentium III with 128MB RAM running Red Hat 7.0, using a vanilla 2.4.9 kernel. The server was a dual 650 MHz Pentium III running Red Hat 7.2 beta 2, using the 2.4.7-2-smp kernel that came with rh7.2b2. The client and server were connected using 100baseT through a Sohoware switch. tcpdump was run to verify that no DNS or Ident queries were generated by the ftp daemon. The systems were not running X, and

/etc/init.d/crond stop
/etc/init.d/anacron stop

were run to prevent cron from starting expensive housekeeping tasks during the run.

The commands

ulimit -n 8192
echo 1024 32767 > /proc/sys/net/ipv4/ip_local_port_range
echo 8192 > /proc/sys/fs/file-max

were run on both client and server before starting.

A 10 kilobyte file was fetched repeatedly by many simulated users, using the command

time ./dkftpbench -hSERVER -n3000 -t180 -b1000 -f/pub/x10k.dat -sX

where X was p for poll or r for realtime signals, and
SERVER was the hostname of the machine running the ftp server.
The -b1000 option selects a client download speed of 1000 bytes / sec; clients that can't maintain that rate cause the test to fail.

Results
	wu-ftpd	ncftpd
Version	2.6.1-18	2.7.0
Options	-s -A	minusers 3000
10KB file, 1kbyte/sec, poll()
users	730	3000
client CPU	56-85%	83-90%
10KB file, 1kbyte/sec, F_SETSIG, rtsig-max=1024
users	741	3000
client CPU	3-7%	28-40%

At 730 users (that's all wu-ftpd could handle), F_SETSIG used only one tenth as much CPU time as poll() (7% vs. 70%) according to top.

For tests above 730 users, ncftpd was used instead of wu-ftpd. Oddly, it made dkftpbench use less CPU; maybe it wrote out the data in fewer packets.

At 2500 users, F_SETSIG's advantage started to wilt a bit, and CPU usage climbed to 15%-20% (compared to poll's 70%).

At 3000 users (as far as I was able to get ncftpd to go), with /proc/sys/kernel/rtsig-max at its default value of 1024, F_SETSIG's CPU usage climbed as high as 40% (compared to poll's 90%).

The program had to fall back to poll() several times per second. This is a bit suprising, since at any one time, there were only about 300 active fd's. It seems that when F_SETSIG is on, a signal is queued each time a packet is received -- even if that signal carries exactly the same information as the ten signals already in the queue.

I also tested with Luban's signal-per-fd patch (see my post on linux-kernel), which did reduce the number of redundant signals, and got rid of nearly all the SIGIO's, but did not seem to reduce CPU usage. Perhaps at higher numbers of users...

24-25 Jan 2000: wu-ftpd 120, proftpd 151, ncftpd 134, betaftpd 258

I ran a quick comparison of wu-ftpd, proftpd, ncftpd, and betaftpd on a 100baseT connection between a fast, big client and a slow, small server.
In all cases, commuication was done via 100baseT using a crossover cable (no hub, to avoid collisions). X was running on the client system. Very little else was active on either machine.

Version 0.7 of the dkftpbench benchmark started a number of simulated dialup users which logged in and then fetched a single binary 1 megabyte file (x1000k.dat) over and over without logging out; no directory listings were requested. If any user didn't get 75% of 28.8kbits/sec, it dropped out. The run continued until the number of users was stable for 3 minutes.

wu-ftpd was run standalone with argument -s.

Tuning suggestions for wu-ftpd were received but not used. Next time...

ncftpd was set with minusers 50 or 100, maxusers 300. The author kindly provided me with tuning notes, but I have not used them yet. Next time...

betaftpd was compiled with -g -O2 instead of -g, and the check for too many clients was fixed (should be just "is fd > 1023?").

proftpd-1.2.0pre10 was compiled from source for standalone without any special options, and MaxInstances was set to 500.

The commands

ulimit -n 4096
echo 1024 32767 > /proc/sys/net/ipv4/ip_local_port_range
echo 4096 > /proc/sys/fs/file-max

were run on both client and server before starting.

Here's a summary of the software used, and the results from dkftpbench:

	wu-ftpd	proftpd	ncftpd	betaftpd
Version	wu-2.5.0(1)	1.2.0pre10	2.5.0/391	0.0.8pre11
Patches				gcc -O2, fixed max fd check
Options	-s
10KB file
users	49	82	194	751
load avg	8	7	6	1.0
CPU	90%	100%	50%	100%
1000KB file
users	120	151	134	258
load avg	50	60	5	0.25
CPU	80%?	80%	25%	20%

Note that betaftpd was limited by the 1024 file descriptor limit of select in the run with 751 clients. (Oddly enough, you don't always need two sockets per client; during part of the transfer, the data socket is closed, and doesn't count against the 1024 fd per process limit.)

dkftpbench was compiled using the fastest options listed in the Makefile, and run with the commandline

bench -hp90 -fx100k.dat -n300 -t180

except that the number of users was adjusted downwards if too many clients failed, and upwards if none failed.

The server was nearly unusable when wu-ftpd was running, and really bogged down when proftpd was running. ncftpd and betaftpd felt very light.

13/14 Jan 2000: wu-ftpd 39 or 115, ncftpd 184, betaftpd 249

I ran a quick comparison of wu-ftpd, ncftpd, and betaftpd on a 10baseT connection between a fast, big client and a slow, small server.
In all cases, commuication was done via 10baseT using a crossover cable (no hub, to avoid collisions). X was not running on either system.

Version 0.4 of the dkftpbench benchmark started a number of simulated dialup users which logged in and then fetched a single binary 100kilobyte file (x100k.dat) over and over without logging out; no directory listings were requested. If any user didn't get 80% of 28.8kbits/sec, it dropped out. The run continued until the number of users was stable for 3 minutes.

The test severely taxed the 10baseT connection; it should be redone with full duplex 100baseT.

wu-ftpd was run in two ways: started from inetd untuned, and (after staring at the man page for a while) standalone with argument -s. (The -w and -Q options were also tried, but had little effect on this benchmark).

Tuning suggestions for wu-ftpd were received after the test, and might be used in a future test.

ncftpd was set with minusers 50, maxusers 300. The author kindly provided me with tuning notes, but I have not used them yet; I believe they will make more of a difference in larger runs or runs with smaller files, so I'll use them when I do a 100baseT test.

betaftpd was compiled with -O6 instead of -g, and a small bug in its PASV command was fixed (thanks to the author for the patch).

Here's a summary of the software used, and the results, given as (# of users started) ... (# of users left at end of run):

wu-ftpd wu-ftpd ncftpd betaftpd
Version wu-2.5.0(1) wu-2.5.0(1) 2.5.0/391 betaftpd 0.0.8pre10
Patches gcc -O6, PASV fix
Options -s
30 users 30 30
50 users 39 50 50 50
100 users 100 100 98
150 users 115 150 145
200 users 155 192
300 users 184 249

	wu-ftpd	wu-ftpd	ncftpd	betaftpd
Version	wu-2.5.0(1)	wu-2.5.0(1)	2.5.0/391	betaftpd 0.0.8pre10
Patches				gcc -O6, PASV fix
Options		-s
30 users	30	30
50 users	39	50	50	50
100 users		100	100	98
150 users		115	150	145
200 users			155	192
300 users			184	249

dkftpbench was compiled using the fastest options listed in the Makefile, and run with the commandline

bench -hp90 -fx100k.dat -n$n -t180 >> $n.log

Here are system parameters and various system limits measured by 'dklimits' and 'free' before the benchmark run. (ulimit -n was not changed for either client nor server, so only 1024 filehandles were available per process.) Note that since each of the 250 users took about 30 seconds for each fetch, the test only needed about 5 new ports per second, so the limit of 3900 local ports on the client was not an issue.

Server Client
CPU 90 MHz Pentium 450 MHz Pentium III
OS Red Hat 6.0 Red Hat 6.1
Total mem 30756KB 128012KB
Free mem 8408KB 76616KB
/proc/sys/net/ipv4/ip_local_port_range 1024 32767 1024 4999
/proc/sys/fs/file-max 4096 4096
/proc/sys/fs/inode-max 4096 16384
available fd's 1021 1021
" explicit ports 60413 60355
" ephemeral ports 1021 1021
" nonblocking connect()'s 1021 1021
poll() limit 2048 2048

	Server	Client
CPU	90 MHz Pentium	450 MHz Pentium III
OS	Red Hat 6.0	Red Hat 6.1
Total mem	30756KB	128012KB
Free mem	8408KB	76616KB
/proc/sys/net/ipv4/ip_local_port_range	1024 32767	1024 4999
/proc/sys/fs/file-max	4096	4096
/proc/sys/fs/inode-max	4096	16384
available fd's	1021	1021
" explicit ports	60413	60355
" ephemeral ports	1021	1021
" nonblocking connect()'s	1021	1021
poll() limit	2048	2048

The actual conditions varied slightly from the above description, but not enough to affect results (e.g. a hub was used initially, was swapped out for a crossover cable when collisions were noticed, but the results changed by less than 1%).