I am very interested in Statistical Analysis of source code. One of the tools I use to do this sort of thing is the analysis of the binary source code generated by the compiler.

Function Code size

A very simple thing to measure is the distribution of function sizes in a program.

Program Name Version Platform Contributor Distribution Graph
Nautilus (plus libs) cvs 2005-01-25 x86 Mathieu nautilus.distrib nautilus.png
Evince (plus libs) cvs 2005-01-25 x86 Mathieu evince.distrib evince.png
Mozilla (plus libs) cvs 1.7.6 2005-02-18 x86 Mathieu mozilla.distrib mozilla.png
Linux (default config) 2.6.12 x86 Mathieu linux-2.6.12.distrib linux-2.6.12.png
Emacs cvs 2005-10-14 ppc32 Sean Neakums emacs.distrib emacs.png
Apache (with apr/ssl) 2.0.55 x86 Mathieu apache.distrib apache.png

If you are interested in this sort of thing, I would be more than happy to gather other datasets:

The pngs on this webpage are generated with this command:

for f in `ls *.distrib`; do \
    PNG=`echo $f|sed -e 's/.distrib/.png/'`; \
    echo "set logscale xy; set terminal png;set output '$PNG';" >/tmp/foo; \
    echo "plot '$f'">>/tmp/foo; \
    gnuplot /tmp/foo; \
done;