ABSTRACT
Unlike benchmarks that focus on performance or reliability evaluations, a benchmark for computer security must necessarily include sensitive code and data. Because these artifacts could damage systems or reveal personally identifiable information about the users affected by cyber attacks, publicly disseminating such a benchmark raises several scientific, ethical and legal challenges. We propose the Worldwide Intelligence Network Environment (WINE), a security-benchmarking approach based on rigorous experimental methods. WINE includes representative field data, collected worldwide from 240,000 sensors, for new empirical studies, and it will enable the validation of research on all the phases in the lifecycle of security threats. We tackle the key challenges for security benchmarking by designing a platform for repeatable experimentation on the WINE data sets and by collecting the metadata required for understanding the results. In this paper, we review the unique characteristics of the WINE data, we discuss why rigorous benchmarking will provide fresh insights on the security arms race and we propose a research agenda for this area.
- Baker, M. G., Hartman, J. H., Kupfer, M. D., Shirriff, K. W., and Ousterhout, J. K. 1991. Measurements of a distributed file system. In ACM Symposium on Operating Systems Principles. Pacific Grove, CA, 198--212. Google ScholarDigital Library
- Brumley, D., Poosankam, P., Song, D. X., and Zheng, J. 2008. Automatic patch-based exploit generation is possible: Techniques and implications. In IEEE Symposium on Security and Privacy. Oakland, CA, 143--157. Google ScholarDigital Library
- Camp, J., Cranor, L., Feamster, N., Feigenbaum, J., Forrest, S., Kotz, D., Lee, W., Lincoln, P., Paxson, V., Reiter, M., Rivest, R., Sanders, W., Savage, S., Smith, S., Spafford, E., and Stolfo, S. 2009. Data for cybersecurity research: Process and "wish list". http://www.gtisc.gatech.edu/files_nsf10/data-wishlist.pdf.Google Scholar
- Chatfield, C. 1983. Statistics for Technology: A Course in Applied Statistics, 3rd ed. Chapman & Hall/CRC.Google Scholar
- Cova, M., Leita, C., Thonnard, O., Keromytis, A. D., and Dacier, M. 2010. An analysis of rogue AV campaigns. In International Symposium on Recent Advances in Intrusion Detection. Ottawa, Canada, 442--463. Google ScholarDigital Library
- CWE/SANS. 2010. Top 25 most dangerous programming errors.Google Scholar
- Dean, J. and Ghemawat, S. 2004. MapReduce: Simplified data processing on large clusters. In USENIX Symposium on Operating Systems Design and Implementation. San Francisco, CA, 137--150. Google ScholarDigital Library
- DeWitt, D. J. 1993. The Wisconsin benchmark: Past, present, and future. In The Benchmark Handbook for Database and Transaction Systems, J. Gray, Ed. Morgan Kaufmann.Google Scholar
- DHS. 2011a. DETER. http://www.isi.deterlab.net/.Google Scholar
- DHS. 2011b. PREDICT. http://www.predict.org/.Google Scholar
- Eide, E., Stoller, L., and Lepreau, J. 2007. An experimentation workbench for replayable networking research. In USENIX Symposium on Networked Systems Design and Implementation. Cambridge, MA. Google ScholarDigital Library
- Frei, S. 2009. Security econometrics: The dynamics of (in)security. Ph. D. thesis, ETH Zürich.Google Scholar
- Google Inc. 2011. Google Apps service level agreement. http://www.google.com/apps/intl/en/terms/sla.html.Google Scholar
- Griffin, K., Schneider, S., Hu, X., and Chiueh, T.-C. 2009. Automatic generation of string signatures for malware detection. In International Symposium on Recent Advances in Intrusion Detection. Saint-Malo, France, 101--120. Google ScholarDigital Library
- Keeton, K., Mehra, P., and Wilkes, J. 2009. Do you know your IQ? A research agenda for information quality in systems. SIGMETRICS Performance Evaluation Review 37, 26--31. Google ScholarDigital Library
- Leita, C., Bayer, U., and Kirda, E. 2010. Exploiting diverse observation perspectives to get insights on the malware landscape. In International Conference on Dependable Systems and Networks. Chicago, IL, 393--402.Google Scholar
- Lippmann, R. P., Fried, D. J., Graf, I., Haines, J. W., Kendall, K. R., McClung, D., Weber, D., Webster, S. E., Wyschogrod, D., Cunningham, R. K., and Zissman, M. A. 2000. Evaluating intrusion detection systems: The 1998 DARPA off-line intrusion detection evaluation. DARPA Information Survivability Conference and Exposition, 12--26.Google Scholar
- Maxion, R. A. and Townsend, T. N. 2004. Masquerade detection augmented with error analysis. IEEE Transactions on Reliability 53, 1, 124--147.Google ScholarCross Ref
- McHugh, J. 2000. Testing intrusion detection systems: A critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory. ACM Transactions on Information and System Security 3, 4, 262--294. Google ScholarDigital Library
- Miller, B. P., Fredriksen, L., and So, B. 1990. An empirical study of the reliability of UNIX utilities. Communications of the ACM 33, 12 (Dec), 32--44. Google ScholarDigital Library
- Pavlo, A., Paulson, E., Rasin, A., Abadi, D. J., Dewitt, D. J., Madden, S., and Stonebraker, M. 2009. A comparison of approaches to large-scale data analysis. In ACM SIGMOD International Conference on Management of Data. Providence, RI, 165--178. Google ScholarDigital Library
- Paxson, V. 2004. Strategies for sound internet measurement. In Internet Measurement Conference. Taormina, Italy, 263--271. Google ScholarDigital Library
- Perkins, J. H., Kim, S., Larsen, S., Amarasinghe, S., Bachrach, J., Carbin, M., Pacheco, C., Sherwood, F., Sidiroglou, S., Sullivan, G., Wong, W.-F., Zibin, Y., Ernst, M. D., and Rinard, M. 2009. Automatically patching errors in deployed software. In ACM Symposium on Operating Systems Principles. Big Sky, Montana, USA, 87--102. Google ScholarDigital Library
- Pham, N. H., Nguyen, T. T., Nguyen, H. A., and Nguyen, T. N. 2010. Detection of recurring software vulnerabilities. In IEEE/ACM International Conference on Automated Software Engineering. Antwerp, Belgium, 447--456. Google ScholarDigital Library
- Zaharia, M., Konwinski, A., Joseph, A. D., Katz, R. H., and Stoica, I. 2008. Improving MapReduce performance in heterogeneous environments. In USENIX Symposium on Operating Systems Design and Implementation. San Diego, CA, 29--42. Google ScholarDigital Library
Index Terms
- Toward a standard benchmark for computer security research: the worldwide intelligence network environment (WINE)
Recommendations
Subsetting the SPEC CPU2006 benchmark suite
On August 24, 2006, the Standard Performance Evaluation Corporation (SPEC) announced CPU2006 -- the next generation of industry-standardized CPU-intensive benchmark suite. The SPEC CPU benchmark suite has become the most frequently used suite for ...
A Benchmark Characterization of the EEMBC Benchmark Suite
Benchmark consumers expect benchmark suites to be complete, accurate, and consistent, and benchmark scores serve as relative measures of performance. However, it is important to understand how benchmarks stress the processors that they aim to test. This ...
Toward an automated benchmark management system
SOAP 2016: Proceedings of the 5th ACM SIGPLAN International Workshop on State Of the Art in Program AnalysisThe systematic evaluation of program analyses as well as software-engineering tools requires benchmark suites that are representative of real-world projects in the domains for which the tools or analyses are designed. Such benchmarks currently only ...
Comments