Validating and debugging OpenOffice.org with Valgrind

By Jens-Heiner Rechtien, Sun Microsystems

See also the official copy of this document in .sxw format only at openoffice.org.

(See also issue 20184. The first snapshot with these fixes is 680m17.)

Valgrind is a highly effective memory management and threading debugging tool for Linux x86 written by Julian Seward and others (http://valgrind.kde.org). We have used it with good success on OOo and found a bunch potential problems and outright bugs just by executing the standard OOo smoke test.

Classes of detected problems

The detected problems can be classified as in table 1. Please note that “distinct occurrences” means “problems detected by distinct debugging activities, similar problems in neighboring code are not separately counted. In one case there were no less than 62 not initialized instance data members in one file.– They are counted as one occurrence in this table.

Type of error

Number of distinct occurrence

not initialized instance data member

10

not initialized local variables

5

not initialized variable used as in/out parameter in method call

3

overlapping buffers in strncpy()

1

off by one error

1

unchecked return value of system call

1

partly initialized struct used in inappropriate ways

1

no check for potentially invalidated index

1

use of buffer size instead of content size for writing out data

1

write not initialized buffer into stream

1

feed unterminated buffer into method expecting a C-style string

1

Table 1: Problems found by valgrind

A rough evaluation of the consequences of the detected problems showed that about one third would never show up as a program failure for the user, another third are bugs which have no consequences yet, but might lead to regression later if code is changed and the last third are plain bugs which might crash the application or lead to malfunction anytime.

All these problems have been found by just running the smoke test document with disabled Java tests. No doubt, a full automated test will reveal more problems. Fixes for the above bugs have been integrated into the SRC680 m16 milestone, which should be reasonably “valgrind clean”. Be aware that new valgrind detectable problems have already found their way into the code.

Using valgrind for validating and debugging OpenOffice.org

Hardware

Like any memory debugging tool, valgrind demands a certain amount of understanding of what is going on and requires reasonable hardware. After all OOo isn't exactly a lightweight application. I found that with a 1800 MHz Pentium IV with 512 MBytes it's possible to start using valgrind on OOo, but having more memory is a big plus, especially if you want to break into gdb and have a lot of libraries with debug information. 1 Gbytes of memory allows comfortable work without too much paging. After running the smoke test document the valgrind+soffice.bin process is about 400 MBytes huge. Be aware that gdb alone can eat >700 MBytes if you have all applications (writer, calc, impress) compiled with debug information. The speed of valgrind controlling OOo valgrind is impressive compared with other memory debuggers like purify etc, the smoke test document needs about 10 min. for completion on the mentioned machine.

Preparations

Compiling and installing valgrind is easy. Just download it from http://valgrind.kde.org, configure it and “make install”.

Valgrind can be used without instrumenting the code. But it needs to intercept all calls into the memory management thus it will work nicely if the application uses malloc/free for obtaining and freeing heap memory. OOo uses it own memory management routines, but this can be overridden by compiling sal/rtl/alloc.c with FORCE_SYSALLOC defined. Copy the relinked sal library into the OOo installation and start valgrinding OOo.

$ cd <Ooo-installtion>/program
$ valgrind –-gdb-attach=yes ./soffice.bin

With the –gdb-attach switch valgrind allows to break into gdb if a problem shows up.

It's possible to use valgrind on a plain optimized product version if you've got the right libsal.so. But in this case there will be a few false positives (for example in llibsw680li.so ibcppu.so and libvcl680li.so). In my experiences valgrind will never show false positives if everything is compiled without any optimization. In any way compiling without optimizations will you give you more precise line information after breaking into gdb. If you compile the stuff with debug information valgrind will print line numbers with the stack information.

Debugging OpenOffice.org with valgrind

Running OOo under valgrind control on a libc-2.3.2 system will yield at least six diagnostics, four in /lib/ld-2.3.2.so and two in libICE.so. These are valid diagnostics but not of our concern. More libc and X11 diagnostics are automatically suppressed by the default valgrind suppression files. It's of course possible to write our own suppression files (valgrind even lends a hand with the –gen-suppressions switch), possibly including the mentioned six diagnostics.

For understanding the diagnostics it's important to know how valgrind works. It keeps track of memory by using valid bits for every piece of memory (stack or heap). Initially these bits are cleared. Initializing a piece of memory will set the valid bits. It's possible to copy not initialized memory around, even do arithmetic operations on it and valgrind will not complain. It will just mark the result of such operation with cleared valid bits. When the observable outcome of the program depends on a piece of memory with cleared valid bits then valgrind will issue a diagnostic. When does the observable outcome change? Well, if the piece of memory is used in a conditional. Or is written to the system, via write(). This means a not initialized variable can have undergone a lot of operations before it triggers a diagnostics. Often it's easy to see what is wrong, sometimes the initialization status needs to be tracked with if() clauses to find out what is wrong (you can also use the valgrind macro defined in valgrind.h for that, please see the documentation).

It's not possible to analyze calls into the Java virtual machine. The JVM does it's own memory management including garbage collection and this hinders valgrind from keeping an eye on each piece of memory.

Summary

Advantages of using valgrind
Restrictions