Actual software in use today is not known to follow any uniform normal distribution, whether syntactically—in the language of all programs described by the grammar of a given programming language, or semantically—for example, in the set of reachable states. Hence claims deduced from any given set of benchmarks need not extend to real-world software systems.
When building software analysis tools, this affects all aspects of tool construction: starting from language front ends not being able to parse and process real-world programs, over inappropriate assumptions about (non-)scalability, to failing to meet actual needs of software developers.
To narrow the gap between real-world software demands and software analysis tool construction, an experiment using the Debian Linux distribution has been set up. The Debian distribution presently comprises of more than
source software packages. Focussing on C source code, more than
million lines of code are automatically analysed in this experiment, resulting in a number of improvements in analysis tools on the one hand, but also more than
public bug reports to date.