ABSTRACT
Protocol reverse engineering, the process of extracting the application-level protocol used by an implementation, without access to the protocol specification, is important for many network security applications. Recent work [17] has proposed protocol reverse engineering by using clustering on network traces. That kind of approach is limited by the lack of semantic information on network traces. In this paper we propose a new approach using program binaries. Our approach, shadowing, uses dynamic analysis and is based on a unique intuition - the way that an implementation of the protocol processes the received application data reveals a wealth of information about the protocol message format. We have implemented our approach in a system called Polyglot and evaluated it extensively using real-world implementations of five different protocols: DNS, HTTP, IRC, Samba and ICQ. We compare our results with the manually crafted message format, included in Wireshark, one of the state-of-the-art protocol analyzers. The differences we find are small and usually due to different implementations handling fields in different ways. Finding such differences between implementations is an added benefit, as they are important for problems such as fingerprint generation, fuzzing, and error detection.
- How Samba Was Written. http://samba.org/ftp/tridge/misc/french cafe.txt.Google Scholar
- Icqlib: The ICQ Library. http://kicq.sourceforge.net/icqlib.shtml.Google Scholar
- Libyahoo2: A C Library for Yahoo! Messenger. http://libyahoo2.sourceforge.net.Google Scholar
- MSN Messenger Protocol. http://www.hypothetic.org/docs/msn/index.php.Google Scholar
- Qemu: Open Source Processor Emulator. http://fabrice.bellard.free.fr/qemu/.Google Scholar
- Tcpdump. http://www.tcpdump.org/.Google Scholar
- The UnOfficial AIM/OSCAR Protocol Specification. http://www.oilcan.org/oscar/.Google Scholar
- Wireshark, Network Protocol Analyzer. http://www.wireshark.org.Google Scholar
- M. A. Beddoe. Network Protocol Analysis Using Bioinformatics Algorithms. http://www.baselineresearch.net/PI/.Google Scholar
- N. Borisov, D. J. Brumley, H. J. Wang, and C. Guo. Generic Application-Level Protocol Analyzer and Its Language. Network and Distributed System Security Symposium, San Diego, CA, February 2007.Google Scholar
- D. Brumley, J. Caballero, Z. Liang, J. Newsome, and D. Song. Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation. USENIX Security Symposium, Boston, MA, August 2007. Google ScholarDigital Library
- J. Caballero, S. Venkataraman, P. Poosankam, M. G. Kang, D. Song, and A. Blum. FiG: Automatic Fingerprint Generation. Network and Distributed System Security Symposium, San Diego, CA, February 2007.Google Scholar
- J. Chow, B. Pfaff, T. Garfinkel, K. Christopher, and M. Rosenblum. Understanding Data Lifetime Via Whole System Simulation. USENIX Security Symposium, San Diego, CA, August 2004. Google ScholarDigital Library
- M. Costa, J. Crowcroft, M. Castro, A. Rowstron, L. Zhou, L. Zhang, and P. Barham. Vigilante: End-to-End Containment of Internet Worms. Symposium on Operating Systems Principles, Brighton, United Kingdom, October 2005. Google ScholarDigital Library
- J. R. Crandall, S. F. Wu, and F. T. Chong. Minos: Architectural Support for Protecting Control Data. ACM Transactions on Architecture and Code Optimization, December 2006. Google ScholarDigital Library
- D. Crocker and P. Overell. Augmented BNF for Syntax Specifications: ABNF. RFC 4234 (Draft Standard), 4234, October 2005.Google Scholar
- W. Cui, J. Kannan, and H. J. Wang. Discoverer: Automatic Protocol Description Generation from Network Traces. USENIX Security Symposium, Boston, MA, August 2007. Google ScholarDigital Library
- W. Cui, V. Paxson, N. C. Weaver, and R. H. Katz. Protocol-Independent Adaptive Replay of Application Dialog. Network and Distributed System Security Symposium, San Diego, CA, February 2006.Google Scholar
- H. Dreger, A. Feldmann, M. Mai, V. Paxson, and R.Sommer. Dynamic Application-Layer Protocol Analysis for Network Intrusion Detection. USENIX Security Symposium, Vancouver, Canada, July 2006. Google ScholarDigital Library
- C. D. Grosso, G. Antoniol, M. D. Penta, P. Galinier, and E. Merlo. Improving Network Applications Security: A New Heuristic to Generate Stress Testing Data. Genetic and Evolutionary Computation Conference, June 2005. Google ScholarDigital Library
- P. Haffner, S. Sen, O. Spatscheck, and D. Wang. ACAS: Automated Construction of Application Signatures. ACM SIGCOMM, Workshop on Mining network data, Philadelphia, PA, October 2005. Google ScholarDigital Library
- J. Kannan, J. Jung, V. Paxson, and C. E. Koksal. Semi-Automated Discovery of Application Session Structure. Internet Measurement Conference, Rio de Janeiro, Brazil, October 2006. Google ScholarDigital Library
- C. Leita, K. Mermoud, and M. Dacier. ScriptGen: An Automated Script Generation Tool for Honeyd. Annual Computer Security Applications Conference, Tucson, AZ, December 2005. Google ScholarDigital Library
- J. Lim, T. Reps, and B. Liblit. Extracting Output Formats from Executables. Working Conference on Reverse Engineering, Benevento, Italy, October 2006. Google ScholarDigital Library
- J. Ma, K. Levchenko, C. Kreibich, S. Savage, and G. M. Voelker. Unexpected Means of Protocol Inference. Internet Measurement Conference, Rio de Janeiro, Brazil, October 2006. Google ScholarDigital Library
- P. McMinn, M. Harman, D. Binkley, and P. Tonella. The Species Per Path Approach to SearchBased Test Data Generation. International Symposium on Software Testing and Analysis, July 2006. Google ScholarDigital Library
- P. V. Mockapetris. Domain Names - Implementation and Specification. RFC 1035 (Standard), IETF Request for Comments 1035, November 1987. Google ScholarDigital Library
- J. Newsome and D. Song. Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software. Network and Distributed System Security Symposium, San Diego, CA, February 2005.Google Scholar
- J. Newsome, D. Brumley, and D. Song. Vulnerability-Specific Execution Filtering for Exploit Prevention on Commodity Software. Network and Distributed System Security Symposium, San Diego, CA, February 2006.Google Scholar
- J. Newsome, D. Brumley, J. Franklin, and D. Song. Replayer: Automatic Protocol Replay By Binary Analysis. ACM Conference on Computer and Communications Security, Alexandria, VA, October 2006. Google ScholarDigital Library
- P. Oehlert. Violating Assumptions with Fuzzing. IEEE Security and Privacy, 3(2), March 2005. Google ScholarDigital Library
- R. Pang, M. Allman, M. Bennett, J. Lee, V. Paxson, and B. Tierney. A First Look At Modern Enterprise Traffic. Internet Measurement Conference, Berkeley, CA, October 2005. Google ScholarDigital Library
- R. Pang, V. Paxson, R. Sommer, and L. Peterson. Binpac: A Yacc for Writing Application Protocol Parsers. Internet Measurement Conference, Rio de Janeiro, Brazil, October 2006. Google ScholarDigital Library
- G. Portokalidis, A. Slowinska, and H. Bos. Argos: An Emulator for Fingerprinting Zero-Day Attacks for Advertised Honeypots with Automatic Signature Generation. ACM SIGOPS Operating Systems Review, 40(4), October 2006. Google ScholarDigital Library
- G. E. Suh, J. W. Lee, D. Zhang, and S. Devadas. Secure Program Execution Via Dynamic Information Flow Tracking. International Conference on Architectural Support for Programming Languages and Operating Systems, Boston, MA, October 2004. Google ScholarDigital Library
- P. Vogt, F. Nentwich, N. Jovanovic, E. Kirda, C. Kruegel, and G. Vigna. Cross-Site Scripting Prevention with Dynamic Data Tainting and Static Analysis. Network and Distributed System Security Symposium, San Diego, CA, February 2007.Google Scholar
- H. Yin, D. Song, E. Manuel, C. Kruegel, and E. Kirda. Panorama: Capturing System-Wide Information Flow for Malware Detection and Analysis. ACM Conference on Computer and Communications Security, Alexandria, VA, October 2007. Google ScholarDigital Library
Index Terms
- Polyglot: automatic extraction of protocol message format using dynamic binary analysis
Recommendations
Tupni: automatic reverse engineering of input formats
CCS '08: Proceedings of the 15th ACM conference on Computer and communications securityRecent work has established the importance of automatic reverse engineering of protocol or file format specifications. However, the formats reverse engineered by previous tools have missed important information that is critical for security ...
A Survey of Automatic Protocol Reverse Engineering Tools
Computer network protocols define the rules in which two entities communicate over a network of unique hosts. Many protocol specifications are unknown, unavailable, or minimally documented, which prevents thorough analysis of the protocol for security ...
Dispatcher: enabling active botnet infiltration using automatic protocol reverse-engineering
CCS '09: Proceedings of the 16th ACM conference on Computer and communications securityAutomatic protocol reverse-engineering is important for many security applications, including the analysis and defense against botnets. Understanding the command-and-control (C&C) protocol used by a botnet is crucial for anticipating its repertoire of ...
Comments