This is the html version of the file http://www.cs.hmc.edu/clinic/projects/2004/google/article.pdf.
Google automatically generates html versions of documents as we crawl the web.
Page 1
Google
Article
HMC CS
1 Abstract
Large software systems tend to have a large number of bugs because writing bug-free code is
hard, and finding bugs is perhaps even harder. A common method to detect bugs is to write
a suite of unit tests. A technique we call differential code coverage analysis might provide
some help by making it easier to decide which parts of the system need more of these unit
tests.
2 Introduction
Large software systems often have internal regression test suites. If a test that used to pass
starts failing, the system is said to have regressed, and the developers know to look at their
recent changes for the problem. Ideally, test suites made up of “unit tests” that test small
parts of the system (typically a small group of related functions) would test all paths through
the source code of the project, and would be quick enough that developers could run all of
them before committing each change to the source tree. In practice, there are so many code
paths that it is economically impossible to test them fully, and developers usually consider
test coverage of 50% to 80% to be sufficient.
To catch the bugs missed by unit tests, developers can use what are called system tests.
These might be as simple as “fire up the application and play with it for a while”. This is
a great way to find bugs, but depends on luck and the skill of the user. Furthermore, these
tests might not find the same bugs on repeated runs. Some system tests can be automated,
which makes them much more repeatable. However, they are still much less likely to be run
before each checkin by the average developer, which means developers often check in code
with bugs that could have been caught by a system test.
If there were a tool that could watch the execution of a system test, and output a
corresponding set of fully automated unit tests, it might greatly improve the quality of the
project. Unfortunately, such a tool is impractical because most functions in complex systems
are stateful, meaning each call depends heavily on the ones that came before it.
A tool that made it easier to figure out which unit tests need to be written or improved,
however, might well be practical. For instance, a tool that finds lines of the project’s source
1

Page 2
Google
Article
HMC CS
code exercised by a failing system test, but not by the unit tests, could be used by the
programmer to decide which parts of the unit test suite needs improving to catch problems
currently only caught by the system test. We call this technique “differential code coverage
analysis”.
3 Differential Code Coverage
Regular code coverage refers to the process of tracking which lines of code are executed while
a program runs. However, for large projects, a description of everything that is executed
will not be specific enough to be truly useful. The developer will be able to see what has
and has not been run, but in a 100 thousand line project, this is of only so much help. To
further hone the search for where tests are most needed, the developer may choose to focus
on the code that is most often executed by a user and not yet tested by the test suite.
To this end, we introduce the idea of differential code coverage, which requires two
different runs of the same source code, the a run of the test suite and a run of any system
test. Only those lines executed by the system test and not by the test suite will be in the
report. The lines of code highlighted using differential code coverage corresponds to the red
region in Figure 1.
Figure 1:
The area marked in red represents the lines of code executed by a system test that were untested
by a unit test suite. This is the area where we want to concentrate test writing efforts.
2

Page 3
Google
Article
HMC CS
4 Methods
To see if this idea worked in practice, we tested it in the context of the Wine project.
Wine is an open-source implementation of the Windows API; for each function defined by
the Windows API, Wine in principle provides an equivalent linux implementation. Ideally
each of these functions also has unit tests, although in actuality, many unit tests are yet
to be written. The unit tests are run periodically on Windows to verify that the test is
correct, and run periodically on Wine to verify that Wine is correct. (Wine calls these unit
tests ‘conformance tests’ to emphasize that they test conformance with the Windows API.)
System testing in Wine is done by running Windows applications on Wine and checking that
they behave properly. If an application doesn’t function properly on Wine, but works on
Windows, it must be a bug in Wine.
The first step was to gain experience with the Wine conformance test suite. Wine’s source
consists mainly of source code for several hundred dynamic link libraries; each such library
is supposed to have a conformance test. One such library which did not yet have a test was
LZExpand, which implements the Windows API functions LZInit, LZCopy, LZOpenFile,
LZSeek, LZRead, LZClose, and GetExpandedName. These functions are mostly used by
older installers and are needed in Wine for backwards compatibility. The functions of the
LZExpand DLL are documented by Microsoft at http://msdn.microsoft.com/library/
en-us/fileio/fs/file_management_functions.asp. We wrote a small set of tests that
exercise those APIs and aborts if anything unexpected happens. We ran the tests on Win-
dows, and adjusted them until they ran correctly. Then we ran them on Wine, and discovered
a small bug in Wine: the behavior when reading past the end of a file with LZRead is not
the same under Wine as it is under real Windows. Finally, we made sure the tests followed
the Wine programming style and used the Wine test suite harness properly. We submitted
the tests as a patch to the Wine maintainers, who promptly accepted it into the Wine source
tree.
The next step was to enhance Wine to support measurement of code coverage. We
chose to use the standard Unix/Linux tool ‘gcov’ to measure code coverage, since it’s free
and comes standard with Linux. Adding support and documentation for gcov to Wine was
straightforward, though we did have to rework our patch several times before coming up with
3

Page 4
Google
Article
HMC CS
something acceptable to the Wine maintainers. Our documentation on how to use gcov with
wine is online at winehq.org at http://www.winehq.org/site/docs/wine-devel/x1356.
The third step in trying out the idea was to develop a tool that compared the code
coverage of two different runs of the same source code. We chose to use the program ’lcov’
from the Linux Test Project (LTP) as a basis. When we started using lcov already could
compare two coverage runs, but not quite in the way we needed. We added the ability to
perform differential code coverage, as previously discussed, and also added a legend to make
the output of LCOV easier to understand. We submitted these changes as patches to the
LTP maintainers. At this writing, most of our enhancements have been accepted into the
LTP CVS tree; the remaining enhancement is being reviewed, but it can be downloaded
from http://sourceforge.net/mailarchive/message.php?msg_id=11451719.
To use lcov, one executes the application in question, uses gcov to extract coverage
info, uses lcov to combine all the gcov output files into a single info file, and finally uses
genhtml to generate a nice HTML report. The report has one top-level page (see figure
2), one page for each source directory (see figure 3), and one page for each source file (see
figure 4). (The source code in these figures is from Peter H. Froehlich’s gcov tutorial,
http://www.cs.ucr.edu/phf/tutorials.html.)
5 An Example Application To Picasa 2
Having prepared the needed tools, we needed to find a misbehaving Windows application,
and see if our idea in fact helps go from a misbehaving system test to a targeted unit test.
We chose the Windows application ’Picasa 2’, a free download from picasa.com. Picasa 2 is
meant to find and manage pictures on the user’s local machine.
Picasa 2 ran fairly well on Wine; however, there were several bugs discovered during
our testing of Picasa 2 on Wine, only one of which will be discussed. Picasa 2 has the
ability to compose selected pictures into a slideshow which it then runs. This feature works
in Windows but hangs and crashes when using Wine. When we discovered this bug, we
compared the code which was exercised during the slideshow execution and the code that
the Wine test suite exercised. We found that 600 lines that were untested by the test suite
and exercised by the slideshow, none of which seemed of consequence to the bug.
4

Page 5
Google
Article
HMC CS
Figure 2: LCOV top-level view, showing all directories
Figure 3: LCOV directory view, showing all files in one directory
5

Page 6
Google
Article
HMC CS
Figure 4: LCOV individual source file view, showing all lines in one file
6

Page 7
Google
Article
HMC CS
We then used a debugging tool supplied by Wine known as winedbg, which is very
similar to gdb. This provided little additional knowledge. Finally, we resorted to printing
out all function calls and arguments to those function calls while running the slideshow
code. During this process, we discovered that several calls to AdvApi32 registry functions
related to slideshow variables that suggested there may be a bug. We inspected the code
implementing the function and it was well tested (73% coverage with several fairly extensive
unit tests). This arroused suspicion that perhaps something was not right with the unit tests
for this function. When we ran the Wine test suite code for these functions on Windows, we
received 28 failures, suggesting that the Wine tests were incorrect. We corrected the tests
and submitted a patch to the wine developers.
We still hadn’t found the actual cause of the slideshow bug; however, after mentioning it
on the Wine developers mailing list, it was discovered to be a problem with YUV-Overlays.
Subsequently, the bug has been patched and the slideshow for Picasa 2 now works on Wine
as well as it does on Windows.
6 Conclusion
Our first test for the LZExpand library was written based on the knowledge that it was
entirely untested. The knowledge was a result of running the basic code coverage tool as was
provided. In the example application to Picasa 2 our tool did not find the bug; however it
did help to find a related bug. In general we find that the following scenarios naturally lend
themselves to differential code coverage analysis:
New Testers When we began writing tests for Wine, we had no idea where to start. The
wine source code is over half a million lines, thus making testing a rather daunting
task. Once we ran LCOV to see what aspects of wine had already been tested, we had
a much better idea of where our efforts would help the most.
Faulty Tests We did not find the AdvApi32 DLL bug using code coverage techniques. We
realized there was a bug when Wine hung, and we used Wine debugging methods to
track it down. However, once we discovered this code had been executed by the test
suite we knew that either the tests were not comprehensive or the tests were incorrect.
7

Page 8
Google
Article
HMC CS
Thus, LCOV yielded additional helpful information.
Motivation While code coverage cannot measure the quality of the test, it can encourage
more tests to be written. It is very rewarding to see a number increase (in our case the
number is percentage of code covered in a given module) and know that you caused
that change. This kind of good feeling can help motivate people to write tests for areas
previously not covered because there is a reward and concrete measure of how much
they contributed. This makes writing tests, a sometimes less than thrilling task, a bit
more exciting.
Increased Focus A program can never be considered fully tested. There are many criterion
a company or open source project may use to say when a program has been well-tested;
however, there is no sure way to know every angle has been covered and every screw
turned. A tester can only hope to get the most bang for the buck when testing an
application. In our case, we got the most bang for our buck by looking for areas used
by Picasa 2 and made sure those areas had unit tests which verify current code is
correct and may one day save a regression bug from seeping into Wine.
Differential code coverage has assisted us in writing unit tests. It is a far cry from the
desired goal of deriving unit tests from system tests, but it is a pleasant start.
8