Article

Polyglot: automatic extraction of protocol message format using dynamic binary analysis

Authors:
Juan Caballero

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

,
Heng Yin

Carnegie Mellon University, Pittsburgh, PA & College of William and Mary, Williamsburg, VA

Carnegie Mellon University, Pittsburgh, PA & College of William and Mary, Williamsburg, VA
View Profile

,
Zhenkai Liang

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

,
Dawn Song

Carnegie Mellon University, Pittsburgh, PA & UC Berkeley, Berkeley, CA

Carnegie Mellon University, Pittsburgh, PA & UC Berkeley, Berkeley, CA
View Profile

CCS '07: Proceedings of the 14th ACM conference on Computer and communications securityOctober 2007Pages 317–329https://doi.org/10.1145/1315245.1315286

Published:28 October 2007Publication History

CCS '07: Proceedings of the 14th ACM conference on Computer and communications security

Pages 317–329

ABSTRACT

Protocol reverse engineering, the process of extracting the application-level protocol used by an implementation, without access to the protocol specification, is important for many network security applications. Recent work [17] has proposed protocol reverse engineering by using clustering on network traces. That kind of approach is limited by the lack of semantic information on network traces. In this paper we propose a new approach using program binaries. Our approach, shadowing, uses dynamic analysis and is based on a unique intuition - the way that an implementation of the protocol processes the received application data reveals a wealth of information about the protocol message format. We have implemented our approach in a system called Polyglot and evaluated it extensively using real-world implementations of five different protocols: DNS, HTTP, IRC, Samba and ICQ. We compare our results with the manually crafted message format, included in Wireshark, one of the state-of-the-art protocol analyzers. The differences we find are small and usually due to different implementations handling fields in different ways. Finding such differences between implementations is an added benefit, as they are important for problems such as fingerprint generation, fuzzing, and error detection.

References

How Samba Was Written. http://samba.org/ftp/tridge/misc/french cafe.txt.Google Scholar
Icqlib: The ICQ Library. http://kicq.sourceforge.net/icqlib.shtml.Google Scholar
Libyahoo2: A C Library for Yahoo! Messenger. http://libyahoo2.sourceforge.net.Google Scholar
MSN Messenger Protocol. http://www.hypothetic.org/docs/msn/index.php.Google Scholar
Qemu: Open Source Processor Emulator. http://fabrice.bellard.free.fr/qemu/.Google Scholar
Tcpdump. http://www.tcpdump.org/.Google Scholar
The UnOfficial AIM/OSCAR Protocol Specification. http://www.oilcan.org/oscar/.Google Scholar
Wireshark, Network Protocol Analyzer. http://www.wireshark.org.Google Scholar
M. A. Beddoe. Network Protocol Analysis Using Bioinformatics Algorithms. http://www.baselineresearch.net/PI/.Google Scholar
N. Borisov, D. J. Brumley, H. J. Wang, and C. Guo. Generic Application-Level Protocol Analyzer and Its Language. Network and Distributed System Security Symposium, San Diego, CA, February 2007.Google Scholar
D. Brumley, J. Caballero, Z. Liang, J. Newsome, and D. Song. Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation. USENIX Security Symposium, Boston, MA, August 2007. Google ScholarDigital Library
J. Caballero, S. Venkataraman, P. Poosankam, M. G. Kang, D. Song, and A. Blum. FiG: Automatic Fingerprint Generation. Network and Distributed System Security Symposium, San Diego, CA, February 2007.Google Scholar
J. Chow, B. Pfaff, T. Garfinkel, K. Christopher, and M. Rosenblum. Understanding Data Lifetime Via Whole System Simulation. USENIX Security Symposium, San Diego, CA, August 2004. Google ScholarDigital Library
M. Costa, J. Crowcroft, M. Castro, A. Rowstron, L. Zhou, L. Zhang, and P. Barham. Vigilante: End-to-End Containment of Internet Worms. Symposium on Operating Systems Principles, Brighton, United Kingdom, October 2005. Google ScholarDigital Library
J. R. Crandall, S. F. Wu, and F. T. Chong. Minos: Architectural Support for Protecting Control Data. ACM Transactions on Architecture and Code Optimization, December 2006. Google ScholarDigital Library
D. Crocker and P. Overell. Augmented BNF for Syntax Specifications: ABNF. RFC 4234 (Draft Standard), 4234, October 2005.Google Scholar
W. Cui, J. Kannan, and H. J. Wang. Discoverer: Automatic Protocol Description Generation from Network Traces. USENIX Security Symposium, Boston, MA, August 2007. Google ScholarDigital Library
W. Cui, V. Paxson, N. C. Weaver, and R. H. Katz. Protocol-Independent Adaptive Replay of Application Dialog. Network and Distributed System Security Symposium, San Diego, CA, February 2006.Google Scholar
H. Dreger, A. Feldmann, M. Mai, V. Paxson, and R.Sommer. Dynamic Application-Layer Protocol Analysis for Network Intrusion Detection. USENIX Security Symposium, Vancouver, Canada, July 2006. Google ScholarDigital Library
C. D. Grosso, G. Antoniol, M. D. Penta, P. Galinier, and E. Merlo. Improving Network Applications Security: A New Heuristic to Generate Stress Testing Data. Genetic and Evolutionary Computation Conference, June 2005. Google ScholarDigital Library
P. Haffner, S. Sen, O. Spatscheck, and D. Wang. ACAS: Automated Construction of Application Signatures. ACM SIGCOMM, Workshop on Mining network data, Philadelphia, PA, October 2005. Google ScholarDigital Library
J. Kannan, J. Jung, V. Paxson, and C. E. Koksal. Semi-Automated Discovery of Application Session Structure. Internet Measurement Conference, Rio de Janeiro, Brazil, October 2006. Google ScholarDigital Library
C. Leita, K. Mermoud, and M. Dacier. ScriptGen: An Automated Script Generation Tool for Honeyd. Annual Computer Security Applications Conference, Tucson, AZ, December 2005. Google ScholarDigital Library
J. Lim, T. Reps, and B. Liblit. Extracting Output Formats from Executables. Working Conference on Reverse Engineering, Benevento, Italy, October 2006. Google ScholarDigital Library
J. Ma, K. Levchenko, C. Kreibich, S. Savage, and G. M. Voelker. Unexpected Means of Protocol Inference. Internet Measurement Conference, Rio de Janeiro, Brazil, October 2006. Google ScholarDigital Library
P. McMinn, M. Harman, D. Binkley, and P. Tonella. The Species Per Path Approach to SearchBased Test Data Generation. International Symposium on Software Testing and Analysis, July 2006. Google ScholarDigital Library
P. V. Mockapetris. Domain Names - Implementation and Specification. RFC 1035 (Standard), IETF Request for Comments 1035, November 1987. Google ScholarDigital Library
J. Newsome and D. Song. Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software. Network and Distributed System Security Symposium, San Diego, CA, February 2005.Google Scholar
J. Newsome, D. Brumley, and D. Song. Vulnerability-Specific Execution Filtering for Exploit Prevention on Commodity Software. Network and Distributed System Security Symposium, San Diego, CA, February 2006.Google Scholar
J. Newsome, D. Brumley, J. Franklin, and D. Song. Replayer: Automatic Protocol Replay By Binary Analysis. ACM Conference on Computer and Communications Security, Alexandria, VA, October 2006. Google ScholarDigital Library
P. Oehlert. Violating Assumptions with Fuzzing. IEEE Security and Privacy, 3(2), March 2005. Google ScholarDigital Library
R. Pang, M. Allman, M. Bennett, J. Lee, V. Paxson, and B. Tierney. A First Look At Modern Enterprise Traffic. Internet Measurement Conference, Berkeley, CA, October 2005. Google ScholarDigital Library
R. Pang, V. Paxson, R. Sommer, and L. Peterson. Binpac: A Yacc for Writing Application Protocol Parsers. Internet Measurement Conference, Rio de Janeiro, Brazil, October 2006. Google ScholarDigital Library
G. Portokalidis, A. Slowinska, and H. Bos. Argos: An Emulator for Fingerprinting Zero-Day Attacks for Advertised Honeypots with Automatic Signature Generation. ACM SIGOPS Operating Systems Review, 40(4), October 2006. Google ScholarDigital Library
G. E. Suh, J. W. Lee, D. Zhang, and S. Devadas. Secure Program Execution Via Dynamic Information Flow Tracking. International Conference on Architectural Support for Programming Languages and Operating Systems, Boston, MA, October 2004. Google ScholarDigital Library
P. Vogt, F. Nentwich, N. Jovanovic, E. Kirda, C. Kruegel, and G. Vigna. Cross-Site Scripting Prevention with Dynamic Data Tainting and Static Analysis. Network and Distributed System Security Symposium, San Diego, CA, February 2007.Google Scholar
H. Yin, D. Song, E. Manuel, C. Kruegel, and E. Kirda. Panorama: Capturing System-Wide Information Flow for Malware Detection and Analysis. ACM Conference on Computer and Communications Security, Alexandria, VA, October 2007. Google ScholarDigital Library

Index Terms

Polyglot: automatic extraction of protocol message format using dynamic binary analysis
1. Hardware
  1. Communication hardware, interfaces and storage
2. Networks

Recommendations

Tupni: automatic reverse engineering of input formats
CCS '08: Proceedings of the 15th ACM conference on Computer and communications security

Recent work has established the importance of automatic reverse engineering of protocol or file format specifications. However, the formats reverse engineered by previous tools have missed important information that is critical for security ...
Read More
A Survey of Automatic Protocol Reverse Engineering Tools

Computer network protocols define the rules in which two entities communicate over a network of unique hosts. Many protocol specifications are unknown, unavailable, or minimally documented, which prevents thorough analysis of the protocol for security ...
Read More
Dispatcher: enabling active botnet infiltration using automatic protocol reverse-engineering
CCS '09: Proceedings of the 16th ACM conference on Computer and communications security

Automatic protocol reverse-engineering is important for many security applications, including the analysis and defense against botnets. Understanding the command-and-control (C&C) protocol used by a botnet is crucial for anticipating its repertoire of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CCS '07: Proceedings of the 14th ACM conference on Computer and communications security
October 2007
628 pages
ISBN:9781595937032
DOI:10.1145/1315245
General Chair:
Peng Ning
NC State University, USA
,
Program Chairs:
Sabrina De Capitani di Vimercati
University of Milan, Italy
,
Paul Syverson
Naval Research Laboratory, USA
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 October 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
binary analysis
protocol reverse engineering
Qualifiers
- Article
Conference

Acceptance Rates
CCS '07 Paper Acceptance Rate55of302submissions,18%Overall Acceptance Rate1,261of6,999submissions,18%
More
Upcoming Conference
CCS '24

Sponsor:

sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

Salt Lake City , UT , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 249
  Total Citations
  View Citations
- 1,780
  Total Downloads
- Downloads (Last 12 months)71
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Polyglot: automatic extraction of protocol message format using dynamic binary analysis

CCS '07: Proceedings of the 14th ACM conference on Computer and communications security

ABSTRACT

References

Cited By

Index Terms

Recommendations

Tupni: automatic reverse engineering of input formats

A Survey of Automatic Protocol Reverse Engineering Tools

Dispatcher: enabling active botnet infiltration using automatic protocol reverse-engineering