Various online studies on the prevalence of spyware attest overwhelming numbers (up to 80%) of infected home computers. However, the term spyware is ambiguous and can refer to anything from plug-ins that display advertisements to software that records and leaks user input. To shed light on the true nature of the spyware problem, a recent measurement paper attempted to quantify the extent of spyware on the Internet. More precisely, the authors crawled the web and analyzed the executables that were downloaded. For this analysis, only a single anti-spyware tool was used. Unfortunately, this is a major shortcoming as the results from this single tool neither capture the actual amount of the threat, nor appropriately classify the functionality of suspicious executables in many cases.
For our analysis, we developed a fully-automated infrastructure to collect and install executables from the web. We use three different techniques to analyze these programs: an online database of spyware-related identifiers, signature-based scanners, and a behavior-based malware detection technique. We present the results of a measurement study that lasted about ten months. During this time, we crawled over 15 million URLs and downloaded 35,853 executables. Almost half of the spyware samples we found were not recognized by the tool used in previous work. Moreover, a significant fraction of the analyzed programs (more than 80%) was incorrectly classified. This underlines that our measurement results are more comprehensive and precise than those of previous approaches, allowing us to draw a more accurate picture of the spyware threat.