gprof and glibc
C/C++ Unix programmers can traditionally obtain a runtime execution
profile of their programs by compiling them with the -pg option,
running them, then using the program
to analyze the log file gmon.out written by the profiling code.
This works under Linux with glibc, but there are two problems.
I've fixed one, and it looks like fixing the other might have to wait until
the current (Aug 2002) flurry of Linuxthread work is finished.
glibc does not support profiling multithreaded programs
glibc cannot profile programs with more than 64000 or so symbols
glibc bug #4379. This bug is now closed; my fix will appear in glibc 2.3.
Jim Panetta at SLAC
reported in July 2001 on bug-glibc that gprof cannot handle 65535 symbols,
and provided a nice test case.
I believe the problem was a thinko confusing the types used to represent
the sampling counters and indices into the call graph table.
Here's a patch I wrote on June 30th 2002 that seems to fix the problem.
It adds a defined type, ARCINDEX, for the latter, and uses it uniformly.
The heuristic used to size the call graph table (tos)
might be a suprise to some users, so I've added a line to
output a message to stderr when the heuristic picks too low a
value, causing table overflow.
The heuristic does pick too low a value when running Jim's regression
test; to work around this, tweak the constant ARCDENSITY up to 3 from 2,
and link the test program statically (which just happens to
make the heuristic allocate a much larger call graph table, due to the
increased text size).
I must say, the table sizing and indexing in gmon is a bit tricky to
understand. Thus I don't trust my patch yet. There may be some table
sizing or indexing bug lurking still.
I have yet to use this on the real program that caused me to look
into the problem, but it does seem to let Jim's regression test pass.
system call in Solaris; this simply takes a sample of the program counter
on each of the next N "clock ticks". It claims to be superior to
the old profil() call as it does not assume that the code to be profiled
is in a contiguous region. However, its documentation does not describe
how it behaves on an SMP system (presumably it does the right thing,
and counts activity on all CPUs).