survey

A Survey of Automatic Protocol Reverse Engineering Tools

Authors:
John Narayan

Virginia Tech, Arlington, VA

Virginia Tech, Arlington, VA
View Profile

,
Sandeep K. Shukla

Indian Institute of Technology, Kanpur, Uttar Pradesh, India

Indian Institute of Technology, Kanpur, Uttar Pradesh, India
View Profile

,
T. Charles Clancy

Virginia Tech, Arlington, VA

Virginia Tech, Arlington, VA
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 48 Issue 3Article No.: 40pp 1–26https://doi.org/10.1145/2840724

Published:09 December 2015Publication History

ACM Computing Surveys

Abstract

Computer network protocols define the rules in which two entities communicate over a network of unique hosts. Many protocol specifications are unknown, unavailable, or minimally documented, which prevents thorough analysis of the protocol for security purposes. For example, modern botnets often use undocumented and unique application-layer communication protocols to maintain command and control over numerous distributed hosts. Inferring the specification of closed protocols has numerous advantages, such as intelligent deep packet inspection, enhanced intrusion detection system algorithms for communications, and integration with legacy software packages. The multitude of closed protocols coupled with existing time-intensive reverse engineering methodologies has spawned investigation into automated approaches for reverse engineering of closed protocols. This article summarizes and organizes previously presented automatic protocol reverse engineering tools by approach. Approaches that focus on reverse engineering the finite state machine of a target protocol are separated from those that focus on reverse engineering the protocol format.

References

Rakesh Agrawal and Srikant Ramakrishnan. 1994. Fast algorithms for mining association rules. In 20th International Conference on Very Large Data Bases (VLDB), Vol. 1215. Google ScholarDigital Library
Glenn Ammons, Rastislav Bodík, and James R. Larus. 2002. Mining specifications. In 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’02). ACM, New York, NY, 4--16. DOI:10.1145/503272.503275http://doi.acm.org/10.1145/503272.503275 Google ScholarCross Ref
João Antunes, Nuno Neves, and Paulo Verissimo. 2011. Reverse engineering of protocols from network traces. In 2011 18th Working Conference on Reverse Engineering (WCRE), 169,178. DOI:10.1109/WCRE.2011.28 Google ScholarDigital Library
Marshall Beddoe. 2004. The protocol informatics project. Retrieved March 19, 2014 from http://www.4tphi.net/&sim;awalters/PI/PI.html.Google Scholar
Nikita Borisov, David J. Brumley, Helen J. Wang, and Chuanxiong Guo. 2007. Generic application-level protocol analyzer and its language. In Network and Distributed System Security Symposium.Google Scholar
Juan Caballero, Heng Yin, Zhenkai Liang, and Dawn Song. 2007. Polyglot: Automatic extraction of protocol message format using dynamic binary analysis. In 14th ACM Conference on Computer and Communications Security (CCS’07). ACM, New York, NY, 317--329. DOI:10.1145/1315245.1315286 http://doi.acm.org/10.1145/1315245.1315286 Google ScholarDigital Library
Juan Caballero, Pongsin Poosankam, Christian Kreibich, and Dawn Song. 2009. Dispatcher: Enabling active botnet infiltration using automatic protocol reverse-engineering. In Proceedings of the 16th ACM Conference on Computer and Communications Security (CCS’09). ACM, New York, NY, 621--634. DOI:10.1145/1653662.1653737 http://doi.acm.org/10.1145/1653662.1653737 Google ScholarDigital Library
Juan Caballero and Dawn Song. 2013. Automatic protocol reverse-engineering: Message format extraction and field semantics inference. International Journal of Computer and Telecommunications Networking 57, 2. Elsevier, 451--474. Google ScholarDigital Library
Chia Yuan Cho, Domagoj Babić, Eui Chul Richard Shin, and Dawn Song. 2010. Inference and analysis of formal models of botnet command and control protocols. In Proceedings of the 17th ACM Conference on Computer and Communications Security (CCS’10). ACM, New York, NY, 426--439. DOI:10.1145/1866307.1866355 http://doi.acm.org/10.1145/1866307.1866355 Google ScholarDigital Library
Paolo Milani Comparetti, Gilbert Wondracek, Christopher Kruegel, and Engin Kirda. 2009. Prospex: Protocol specification extraction. In 2009 30th IEEE Symposium on Security and Privacy, 110--125. DOI:10.1109/SP.2009.14 Google ScholarDigital Library
Ed Crocker. 2008. Augmented BNF for Syntax Specifications: ABNF. Retrieved February 27, 2014 from http://tools.ietf.org/html/rfc5234.Google ScholarCross Ref
Weidong Cui, Vern Paxson, Nicholas C. Weaver, and Randy H. Katz. 2006. Protocol-independent adaptive replay of application dialog. In Proceedings of the 13th Symposium on Network and Distributed System Security (NDSS’06).Google Scholar
Weidong Cui, Jayanthkumar Kannan, and Helen J. Wang. 2007. Discoverer: Automatic protocol description generation from network traces. In USENIX Security Symposium. Google ScholarDigital Library
Weidong Cui, Marcus Peinado, Karl Chen, Helen J. Wang, and Luis Irun-Briz. 2008. Tupni: Automatic reverse engineering of input formats. In 15th ACM Conference on Computer and Communications Security (CCS’08). ACM, New York, NY, 391--402. DOI:10.1145/1455770.1455820 http://doi.acm.org/10.1145/1455770.1455820 Google ScholarDigital Library
Alberto Dainotti, Antonio Pescape, and Kimberly Claffy. 2012. Issues and future directions in traffic classification. IEEE Network 26, 1, (Jan.-Feb. 2012), 35--40. DOI:10.1109/MNET.2012.6135854 Google ScholarDigital Library
Serge Gorbunov and Arnold Rosenbloom. 2010. AutoFuzz: Automated network protocol fuzzing framework. International Journal of Computer Science and Network Security 10, 8, 239--245.Google Scholar
IEEE Standards Association. 2012. IEEE Standard for Electric Power Systems Communications—Distributed Network Protocol (DNP3).Google Scholar
IETF.org. 1999. RFC 2616—Hypertext Transfer Protocol—HTTP/1.1. Retrieved July 20, 2015 from https://www.ietf.org/rfc/rfc2616.txt.Google Scholar
IETF.org. 2014. RFC 7230—Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing. Retrieved July 20, 2015 from https://tools.ietf.org/html/rfc7230.Google Scholar
ITU.int. 2014. Introduction to ASN.1. Retrieved February 27, 2014 from http://www.itu.int/en/ITU-T/asn1/Pages/introduction.aspx.Google Scholar
Jim Kurose and Keith Ross. 2013. Computer Networking: A Top-Down Approach (6th ed.). Addison-Wesley, Upper Saddle River, NJ. Google ScholarDigital Library
Patrick LaRoche, A. Nur Zincir-Heywood, and Malcolm I. Heywood. 2012. Network protocol discovery and analysis via live interaction. In Applications of Evolutionary Computation. Springer, Berlin, 11--20. Google ScholarDigital Library
Patrick LaRoche, Aimee Burrows, and A. Nur Zincir-Heywood. 2013. How far an evolutionary approach can go for protocol state analysis and discovery. In 2013 IEEE Congress on Evolutionary Computation, 3228--3235. DOI:10.1109/CEC.2013.6557965.Google ScholarCross Ref
David Lee and Krishan Sabnani. 1993. Reverse-engineering of communication protocols. In IEEE International Conference on Network Protocols (ICNP), 208--216.Google ScholarCross Ref
David Lee and Mihalis Yannakakis. 1996. Principles and methods of testing finite state machines—A survey. Proceedings of the IEEE 84, 8, 1090--1123. DOI:10.1109/5.533956Google ScholarCross Ref
Corrado Leita, Ken Mermoud, and Marc Dacier. 2005. ScriptGen: An automated script generation tool for HoneyD. In 21st Annual Computer Security Applications Conference (ACSAC’05), 200--214. DOI:10.1109/CSAC.2005.49. Google ScholarDigital Library
Xiangdong Li and Li Chen. 2011. A survey on methods of automatic protocol reverse engineering. In 2011 7th International Conference on Computational Intelligence and Security (CIS), 685--689. Google ScholarDigital Library
Zhiqiang Lin, Xuxian Jiang, Dongyan Xu, and Xiangyu Zhang. 2008. Automatic protocol format reverse engineering through context-aware monitored execution. In NDSS, 1--15.Google Scholar
Zhiqiang Lin, Xiangyu Zhang, and Dongyan Xu. 2010. Reverse engineering input syntactic structure from program execution and its applications. In IEEE Transactions on Software Engineering 36, 5 (2010) 688--703. DOI:10.1109/TSE.2009.54 Google ScholarDigital Library
Min Liu, Chunfu Jia, Lu Liu, and Zhi Wang. 2013. Extracting sent message formats from executables using backward slicing. In 2013 4th International Conference on Emerging Intelligent Data and Web Technologies (EIDWT), 377--384. Google ScholarDigital Library
Jian-Zhen Luo, and Shun-Zheng Yu. 2013. Position-based automatic reverse engineering of network protocols. Journal of Network and Computer Applications 36, 3 (2013), 1070--1077.Google ScholarCross Ref
Justin Ma, Kirill Levchenko, Christian Kreibich, Stefan Savage, and Geoffrey M. Voelker. 2006. Unexpected means of protocol inference. In 6th ACM SIGCOMM Conference on Internet Measurement (IMC’06). ACM, New York, NY, 313--326. DOI:10.1145/1177080.1177123 http://doi.acm.org/10.1145/1177080.1177123 Google ScholarDigital Library
George Mealy. 1955. A method for synthesizing sequential circuits. In Bell System Technical Journal 34, 5 (1955), 1045--1079.Google ScholarCross Ref
Milton Mueller and Asghari Hadi. 2012. Deep packet inspection and bandwidth management: Battles over BitTorrent in Canada and the United States. Telecommunications Policy 36, 6 (2012), 462--475. Google ScholarDigital Library
Saul Needleman and Christian Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48 (1970), 443--53. DOI:10.1016/0022-2836(70)90057-4Google ScholarCross Ref
Norton.com. 2014. Bots and botnets—A growing threat. Retrieved February 26, 2014 from https://us.norton.com/botnet/promo.Google Scholar
Sandip C. Patel, Ganesh D. Bhatt, and James H. Graham. 2009. Improving the cyber security of SCADA communication networks. Communications of the ACM 52, 7 (July 2009), 139--142. DOI:10.1145/1538788.1538820 http://doi.acm.org/10.1145/1538788.1538820 Google ScholarDigital Library
PeachFuzzer.com. 2014. Peach Fuzzer Overview. Retrieved February 26, 2014 from http://peachfuzzer.com/pdf/Peach-Overview-DejaVuSecurity-Datasheet-2014.pdf.Google Scholar
Christian Rossow and Christian J. Dietrich. 2013. Provex: Detecting botnets with encrypted command and control channels. In Detection of Intrusions and Malware, and Vulnerability Assessment, Lecture Notes in Computer Science, Vol. 7967. Springer, Berlin, 21--40. Google ScholarDigital Library
Maxim Shevertalov and Spiros Mancoridis. 2007. A reverse engineering tool for extracting protocols of networked applications. In 14th Working Conference on Reverse Engineering (WCRE’07). 229--238. DOI:10.1109/WCRE.2007.6 Google ScholarDigital Library
Skype.com. 2014. TLS and SRTP for Skype Connect: Technical Datasheet. Retrieved February 27, 2014 from https://support.skype.com/resources/sites/SKYPE/content/live/DOCUMENTS/0/DO14/en_US/skype-connect-technical-datasheet.pdf.Google Scholar
TCPDump/LibPCap. 2010. TCPDump & LibPCap. Retrieved March 19, 2014 from http://www.tcpdump.org/.Google Scholar
Naftali Tishby, Fernando Pereira, and William Bialek. 1999. The information bottleneck method. In Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, 368--377.Google Scholar
Li Tong, Yuan Liu, Chun-rui Zhang, Fan-zhi Meng, and Yang Yue. 2014. A novel method for delimiting frames of unknown protocol. In 2014 IEEE Workshop on Electronics, Computer and Applications, 552--555.Google ScholarCross Ref
Andrew Tridgell. 2003. How SAMBA Was Written. Retrieved February 26, 2014 from http://www.samba.org/ftp/tridge/misc/french_cafe.txt.Google Scholar
Antonio Trifilo, Stefan Burschka, and Ernst Biersack. 2009. Traffic to protocol reverse engineering. In 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, 1--8. DOI:10.1109/CISDA.2009.5356565 Google ScholarCross Ref
Helen J. Wang, Chuanxiong Guo, Daniel R. Simon, and Alf Zugenmaier. 2004. Shield: Vulnerability-driven network filters for preventing known vulnerability exploits. In Proceedings of the 2004 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM’04). ACM, New York, NY, 193--204. DOI:http://dx.doi.org/10.1145/1015467.1015489. Google ScholarDigital Library
Zhi Wang, Xuxian Jiang, Weidong Cui, Xinyuan Wang, and Mike Grace. 2009. ReFormat: Automatic reverse engineering of encrypted messages. In Computer Security—ESORICS 2009. Springer, Berlin, 200--215. Google ScholarDigital Library
Yipeng Wang, Xingjian Li, Jiao Meng, Yong Zhao, Zhibin Zhang, and Li Guo. 2011a. Biprominer: Automatic mining of binary protocol features. In 2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), 179--184. Google ScholarDigital Library
Yipeng Wang, Zhibin Zhang, Danfeng Yao, Buyun Qu, and Li Guo. 2011b. Inferring protocol state machine from network traces: A probabilistic approach. Applied Cryptography and Network Security 2011. Google ScholarDigital Library
Yipeng Wang, XiaoChun Yun, M. Zubair Shafiq, Liyan Wang, Alex X. Liu, Zhibin Zhang, Danfeng Yao, Yong Zheng Zhang, and Li Guo. 2012. A semantics aware approach to automated reverse engineering unknown protocols. In 2012 20th IEEE International Conference on Network Protocols (ICNP). Google ScholarDigital Library
Yong Wang. 2013. Protocol Specification Inference Based on Keywords Identification. Advanced Data Mining and Applications. Springer, Berlin, 443--454. Google ScholarDigital Library
T. A. Welch. 1984. A technique for high-performance data compression. Computer 17, 6 (1984), 8--19. Google ScholarDigital Library
Wine.org. 2014. About Wine. Retrieved February 26, 2014 from http://www.winehq.org/about/.Google Scholar
Gilbert Wondracek, Paolo Milani Comparetti, Christopher Kruegel, and Engin Kirda. 2008. Automatic network protocol analysis. In NDSS, 1--14.Google Scholar
Ming-Ming Xiao, Shun-Zheng Yu, and Yu Wang. 2009. Automatic network protocol automaton extraction. In 2009 3rd International Conference on Network and System Security, 336--343. DOI:10.1109/NSS.2009.71 Google ScholarDigital Library
Zhao Zhang, Qiao-Yan Wen, and Wen Tang. 2012. Mining protocol state machines by interactive grammar inference. In 2012 3rd International Conference on Digital Manufacturing and Automation (ICDMA), 524--527. DOI:10.1109/ICDMA.2012.125 Google ScholarDigital Library

Index Terms

A Survey of Automatic Protocol Reverse Engineering Tools
1. Computing methodologies
  1. Machine learning
2. Networks
  1. Network protocols
    1. Application layer protocols

Recommendations

Towards automated protocol reverse engineering using semantic information
ASIA CCS '14: Proceedings of the 9th ACM symposium on Information, computer and communications security

Network security products, such as NIDS or application firewalls, tend to focus on application level communication flows. However, adding support for new proprietary and often undocumented protocols, implies the reverse engineering of these protocols. ...
Read More
Automatic Reverse Engineering Method for Extracting Well-trimmed Protocol Specification
ICTCE '18: Proceedings of the 2nd International Conference on Telecommunications and Communication Engineering

Emergence of high-speed Internet and ubiquitous environment has led to a rapid increase of applications and malicious behaviors with various functions. Many of the complex and diverse protocols that occur under these situations, are unknown protocols ...
Read More
Automatic protocol reverse-engineering: Message format extraction and field semantics inference

Understanding the command-and-control (C&C) protocol used by a botnet is crucial for anticipating its repertoire of nefarious activity. However, the C&C protocols of botnets, similar to many other application layer protocols, are undocumented. Automatic ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Computing Surveys Volume 48, Issue 3
February 2016
619 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/2856149
Editor:
Sartaj Sahni
Department of Computer and Information Science and Engineering/University of Florida/Gainesville
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 December 2015
- Accepted: 1 September 2015
- Revised: 1 July 2015
- Received: 1 August 2014
Published in csur Volume 48, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Protocol reverse engineering
communication security
Qualifiers
- survey
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 57
  Total Citations
  View Citations
- 2,055
  Total Downloads
- Downloads (Last 12 months)178
- Downloads (Last 6 weeks)25
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Survey of Automatic Protocol Reverse Engineering Tools

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

Towards automated protocol reverse engineering using semantic information

Automatic Reverse Engineering Method for Extracting Well-trimmed Protocol Specification

Automatic protocol reverse-engineering: Message format extraction and field semantics inference