1 Abstract 2 Introduction

Page 1

Google

Article

HMC CS

1 Abstract

Large software systems tend to have a large number of bugs because writing bug-free code is

hard, and finding bugs is perhaps even harder. A common method to detect bugs is to write

a suite of unit tests. A technique we call differential code coverage analysis might provide

some help by making it easier to decide which parts of the system need more of these unit

tests.

2 Introduction

Large software systems often have internal regression test suites. If a test that used to pass

starts failing, the system is said to have regressed, and the developers know to look at their

recent changes for the problem. Ideally, test suites made up of “unit tests” that test small

parts of the system (typically a small group of related functions) would test all paths through

the source code of the project, and would be quick enough that developers could run all of

them before committing each change to the source tree. In practice, there are so many code

paths that it is economically impossible to test them fully, and developers usually consider

test coverage of 50% to 80% to be sufficient.

To catch the bugs missed by unit tests, developers can use what are called system tests.

These might be as simple as “fire up the application and play with it for a while”. This is

a great way to find bugs, but depends on luck and the skill of the user. Furthermore, these

tests might not find the same bugs on repeated runs. Some system tests can be automated,

which makes them much more repeatable. However, they are still much less likely to be run

before each checkin by the average developer, which means developers often check in code

with bugs that could have been caught by a system test.

If there were a tool that could watch the execution of a system test, and output a

corresponding set of fully automated unit tests, it might greatly improve the quality of the

project. Unfortunately, such a tool is impractical because most functions in complex systems

are stateful, meaning each call depends heavily on the ones that came before it.

A tool that made it easier to figure out which unit tests need to be written or improved,

however, might well be practical. For instance, a tool that finds lines of the project’s source

Page 2

Google

Article

HMC CS

code exercised by a failing system test, but not by the unit tests, could be used by the

programmer to decide which parts of the unit test suite needs improving to catch problems

currently only caught by the system test. We call this technique “differential code coverage

analysis”.

3 Differential Code Coverage

Regular code coverage refers to the process of tracking which lines of code are executed while

a program runs. However, for large projects, a description of everything that is executed

will not be specific enough to be truly useful. The developer will be able to see what has

and has not been run, but in a 100 thousand line project, this is of only so much help. To

further hone the search for where tests are most needed, the developer may choose to focus

on the code that is most often executed by a user and not yet tested by the test suite.

To this end, we introduce the idea of differential code coverage, which requires two

different runs of the same source code, the a run of the test suite and a run of any system

test. Only those lines executed by the system test and not by the test suite will be in the

report. The lines of code highlighted using differential code coverage corresponds to the red

region in Figure 1.

Figure 1:

The area marked in red represents the lines of code executed by a system test that were untested

by a unit test suite. This is the area where we want to concentrate test writing efforts.

Page 3

Google

Article

HMC CS

4 Methods

To see if this idea worked in practice, we tested it in the context of the Wine project.

Wine is an open-source implementation of the Windows API; for each function defined by

the Windows API, Wine in principle provides an equivalent linux implementation. Ideally

each of these functions also has unit tests, although in actuality, many unit tests are yet

to be written. The unit tests are run periodically on Windows to verify that the test is

correct, and run periodically on Wine to verify that Wine is correct. (Wine calls these unit

tests ‘conformance tests’ to emphasize that they test conformance with the Windows API.)

System testing in Wine is done by running Windows applications on Wine and checking that

they behave properly. If an application doesn’t function properly on Wine, but works on

Windows, it must be a bug in Wine.

The first step was to gain experience with the Wine conformance test suite. Wine’s source

consists mainly of source code for several hundred dynamic link libraries; each such library

is supposed to have a conformance test. One such library which did not yet have a test was

LZExpand, which implements the Windows API functions LZInit, LZCopy, LZOpenFile,

LZSeek, LZRead, LZClose, and GetExpandedName. These functions are mostly used by

older installers and are needed in Wine for backwards compatibility. The functions of the

LZExpand DLL are documented by Microsoft at http://msdn.microsoft.com/library/

en-us/fileio/fs/file_management_functions.asp. We wrote a small set of tests that

exercise those APIs and aborts if anything unexpected happens. We ran the tests on Win-

dows, and adjusted them until they ran correctly. Then we ran them on Wine, and discovered

a small bug in Wine: the behavior when reading past the end of a file with LZRead is not

the same under Wine as it is under real Windows. Finally, we made sure the tests followed

the Wine programming style and used the Wine test suite harness properly. We submitted

the tests as a patch to the Wine maintainers, who promptly accepted it into the Wine source

tree.

The next step was to enhance Wine to support measurement of code coverage. We

chose to use the standard Unix/Linux tool ‘gcov’ to measure code coverage, since it’s free

and comes standard with Linux. Adding support and documentation for gcov to Wine was

straightforward, though we did have to rework our patch several times before coming up with

Page 4

Google

Article

HMC CS

something acceptable to the Wine maintainers. Our documentation on how to use gcov with

wine is online at winehq.org at http://www.winehq.org/site/docs/wine-devel/x1356.

The third step in trying out the idea was to develop a tool that compared the code

coverage of two different runs of the same source code. We chose to use the program ’lcov’

from the Linux Test Project (LTP) as a basis. When we started using lcov already could

compare two coverage runs, but not quite in the way we needed. We added the ability to

perform differential code coverage, as previously discussed, and also added a legend to make

the output of LCOV easier to understand. We submitted these changes as patches to the

LTP maintainers. At this writing, most of our enhancements have been accepted into the

LTP CVS tree; the remaining enhancement is being reviewed, but it can be downloaded

from http://sourceforge.net/mailarchive/message.php?msg_id=11451719.

To use lcov, one executes the application in question, uses gcov to extract coverage

info, uses lcov to combine all the gcov output files into a single info file, and finally uses

genhtml to generate a nice HTML report. The report has one top-level page (see figure

2), one page for each source directory (see figure 3), and one page for each source file (see

figure 4). (The source code in these figures is from Peter H. Froehlich’s gcov tutorial,

http://www.cs.ucr.edu/phf/tutorials.html.)

5 An Example Application To Picasa 2

Having prepared the needed tools, we needed to find a misbehaving Windows application,

and see if our idea in fact helps go from a misbehaving system test to a targeted unit test.

We chose the Windows application ’Picasa 2’, a free download from picasa.com. Picasa 2 is

meant to find and manage pictures on the user’s local machine.

Picasa 2 ran fairly well on Wine; however, there were several bugs discovered during

our testing of Picasa 2 on Wine, only one of which will be discussed. Picasa 2 has the

ability to compose selected pictures into a slideshow which it then runs. This feature works

in Windows but hangs and crashes when using Wine. When we discovered this bug, we

compared the code which was exercised during the slideshow execution and the code that

the Wine test suite exercised. We found that 600 lines that were untested by the test suite

and exercised by the slideshow, none of which seemed of consequence to the bug.

Page 5

Google

Article

HMC CS

Figure 2: LCOV top-level view, showing all directories

Figure 3: LCOV directory view, showing all files in one directory

Page 6

Google

Article

HMC CS

Figure 4: LCOV individual source file view, showing all lines in one file

Page 7

Google

Article

HMC CS

We then used a debugging tool supplied by Wine known as winedbg, which is very

similar to gdb. This provided little additional knowledge. Finally, we resorted to printing

out all function calls and arguments to those function calls while running the slideshow

code. During this process, we discovered that several calls to AdvApi32 registry functions

related to slideshow variables that suggested there may be a bug. We inspected the code

implementing the function and it was well tested (73% coverage with several fairly extensive

unit tests). This arroused suspicion that perhaps something was not right with the unit tests

for this function. When we ran the Wine test suite code for these functions on Windows, we

received 28 failures, suggesting that the Wine tests were incorrect. We corrected the tests

and submitted a patch to the wine developers.

We still hadn’t found the actual cause of the slideshow bug; however, after mentioning it

on the Wine developers mailing list, it was discovered to be a problem with YUV-Overlays.

Subsequently, the bug has been patched and the slideshow for Picasa 2 now works on Wine

as well as it does on Windows.

6 Conclusion

Our first test for the LZExpand library was written based on the knowledge that it was

entirely untested. The knowledge was a result of running the basic code coverage tool as was

provided. In the example application to Picasa 2 our tool did not find the bug; however it

did help to find a related bug. In general we find that the following scenarios naturally lend

themselves to differential code coverage analysis:

New Testers When we began writing tests for Wine, we had no idea where to start. The

wine source code is over half a million lines, thus making testing a rather daunting

task. Once we ran LCOV to see what aspects of wine had already been tested, we had

a much better idea of where our efforts would help the most.

Faulty Tests We did not find the AdvApi32 DLL bug using code coverage techniques. We

realized there was a bug when Wine hung, and we used Wine debugging methods to

track it down. However, once we discovered this code had been executed by the test

suite we knew that either the tests were not comprehensive or the tests were incorrect.

Page 8

Google

Article

HMC CS

Thus, LCOV yielded additional helpful information.

Motivation While code coverage cannot measure the quality of the test, it can encourage

more tests to be written. It is very rewarding to see a number increase (in our case the

number is percentage of code covered in a given module) and know that you caused

that change. This kind of good feeling can help motivate people to write tests for areas

previously not covered because there is a reward and concrete measure of how much

they contributed. This makes writing tests, a sometimes less than thrilling task, a bit

more exciting.

Increased Focus A program can never be considered fully tested. There are many criterion

a company or open source project may use to say when a program has been well-tested;

however, there is no sure way to know every angle has been covered and every screw

turned. A tester can only hope to get the most bang for the buck when testing an

application. In our case, we got the most bang for our buck by looking for areas used

by Picasa 2 and made sure those areas had unit tests which verify current code is

correct and may one day save a regression bug from seeping into Wine.

Differential code coverage has assisted us in writing unit tests. It is a far cry from the

desired goal of deriving unit tests from system tests, but it is a pleasant start.