skip to main content
survey

A Survey of Automatic Protocol Reverse Engineering Tools

Published:09 December 2015Publication History
Skip Abstract Section

Abstract

Computer network protocols define the rules in which two entities communicate over a network of unique hosts. Many protocol specifications are unknown, unavailable, or minimally documented, which prevents thorough analysis of the protocol for security purposes. For example, modern botnets often use undocumented and unique application-layer communication protocols to maintain command and control over numerous distributed hosts. Inferring the specification of closed protocols has numerous advantages, such as intelligent deep packet inspection, enhanced intrusion detection system algorithms for communications, and integration with legacy software packages. The multitude of closed protocols coupled with existing time-intensive reverse engineering methodologies has spawned investigation into automated approaches for reverse engineering of closed protocols. This article summarizes and organizes previously presented automatic protocol reverse engineering tools by approach. Approaches that focus on reverse engineering the finite state machine of a target protocol are separated from those that focus on reverse engineering the protocol format.

References

  1. Rakesh Agrawal and Srikant Ramakrishnan. 1994. Fast algorithms for mining association rules. In 20th International Conference on Very Large Data Bases (VLDB), Vol. 1215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Glenn Ammons, Rastislav Bodík, and James R. Larus. 2002. Mining specifications. In 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’02). ACM, New York, NY, 4--16. DOI:10.1145/503272.503275http://doi.acm.org/10.1145/503272.503275 Google ScholarGoogle ScholarCross RefCross Ref
  3. João Antunes, Nuno Neves, and Paulo Verissimo. 2011. Reverse engineering of protocols from network traces. In 2011 18th Working Conference on Reverse Engineering (WCRE), 169,178. DOI:10.1109/WCRE.2011.28 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Marshall Beddoe. 2004. The protocol informatics project. Retrieved March 19, 2014 from http://www.4tphi.net/∼awalters/PI/PI.html.Google ScholarGoogle Scholar
  5. Nikita Borisov, David J. Brumley, Helen J. Wang, and Chuanxiong Guo. 2007. Generic application-level protocol analyzer and its language. In Network and Distributed System Security Symposium.Google ScholarGoogle Scholar
  6. Juan Caballero, Heng Yin, Zhenkai Liang, and Dawn Song. 2007. Polyglot: Automatic extraction of protocol message format using dynamic binary analysis. In 14th ACM Conference on Computer and Communications Security (CCS’07). ACM, New York, NY, 317--329. DOI:10.1145/1315245.1315286 http://doi.acm.org/10.1145/1315245.1315286 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Juan Caballero, Pongsin Poosankam, Christian Kreibich, and Dawn Song. 2009. Dispatcher: Enabling active botnet infiltration using automatic protocol reverse-engineering. In Proceedings of the 16th ACM Conference on Computer and Communications Security (CCS’09). ACM, New York, NY, 621--634. DOI:10.1145/1653662.1653737 http://doi.acm.org/10.1145/1653662.1653737 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Juan Caballero and Dawn Song. 2013. Automatic protocol reverse-engineering: Message format extraction and field semantics inference. International Journal of Computer and Telecommunications Networking 57, 2. Elsevier, 451--474. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chia Yuan Cho, Domagoj Babić, Eui Chul Richard Shin, and Dawn Song. 2010. Inference and analysis of formal models of botnet command and control protocols. In Proceedings of the 17th ACM Conference on Computer and Communications Security (CCS’10). ACM, New York, NY, 426--439. DOI:10.1145/1866307.1866355 http://doi.acm.org/10.1145/1866307.1866355 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Paolo Milani Comparetti, Gilbert Wondracek, Christopher Kruegel, and Engin Kirda. 2009. Prospex: Protocol specification extraction. In 2009 30th IEEE Symposium on Security and Privacy, 110--125. DOI:10.1109/SP.2009.14 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ed Crocker. 2008. Augmented BNF for Syntax Specifications: ABNF. Retrieved February 27, 2014 from http://tools.ietf.org/html/rfc5234.Google ScholarGoogle ScholarCross RefCross Ref
  12. Weidong Cui, Vern Paxson, Nicholas C. Weaver, and Randy H. Katz. 2006. Protocol-independent adaptive replay of application dialog. In Proceedings of the 13th Symposium on Network and Distributed System Security (NDSS’06).Google ScholarGoogle Scholar
  13. Weidong Cui, Jayanthkumar Kannan, and Helen J. Wang. 2007. Discoverer: Automatic protocol description generation from network traces. In USENIX Security Symposium. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Weidong Cui, Marcus Peinado, Karl Chen, Helen J. Wang, and Luis Irun-Briz. 2008. Tupni: Automatic reverse engineering of input formats. In 15th ACM Conference on Computer and Communications Security (CCS’08). ACM, New York, NY, 391--402. DOI:10.1145/1455770.1455820 http://doi.acm.org/10.1145/1455770.1455820 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Alberto Dainotti, Antonio Pescape, and Kimberly Claffy. 2012. Issues and future directions in traffic classification. IEEE Network 26, 1, (Jan.-Feb. 2012), 35--40. DOI:10.1109/MNET.2012.6135854 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Serge Gorbunov and Arnold Rosenbloom. 2010. AutoFuzz: Automated network protocol fuzzing framework. International Journal of Computer Science and Network Security 10, 8, 239--245.Google ScholarGoogle Scholar
  17. IEEE Standards Association. 2012. IEEE Standard for Electric Power Systems Communications—Distributed Network Protocol (DNP3).Google ScholarGoogle Scholar
  18. IETF.org. 1999. RFC 2616—Hypertext Transfer Protocol—HTTP/1.1. Retrieved July 20, 2015 from https://www.ietf.org/rfc/rfc2616.txt.Google ScholarGoogle Scholar
  19. IETF.org. 2014. RFC 7230—Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing. Retrieved July 20, 2015 from https://tools.ietf.org/html/rfc7230.Google ScholarGoogle Scholar
  20. ITU.int. 2014. Introduction to ASN.1. Retrieved February 27, 2014 from http://www.itu.int/en/ITU-T/asn1/Pages/introduction.aspx.Google ScholarGoogle Scholar
  21. Jim Kurose and Keith Ross. 2013. Computer Networking: A Top-Down Approach (6th ed.). Addison-Wesley, Upper Saddle River, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Patrick LaRoche, A. Nur Zincir-Heywood, and Malcolm I. Heywood. 2012. Network protocol discovery and analysis via live interaction. In Applications of Evolutionary Computation. Springer, Berlin, 11--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Patrick LaRoche, Aimee Burrows, and A. Nur Zincir-Heywood. 2013. How far an evolutionary approach can go for protocol state analysis and discovery. In 2013 IEEE Congress on Evolutionary Computation, 3228--3235. DOI:10.1109/CEC.2013.6557965.Google ScholarGoogle ScholarCross RefCross Ref
  24. David Lee and Krishan Sabnani. 1993. Reverse-engineering of communication protocols. In IEEE International Conference on Network Protocols (ICNP), 208--216.Google ScholarGoogle ScholarCross RefCross Ref
  25. David Lee and Mihalis Yannakakis. 1996. Principles and methods of testing finite state machines—A survey. Proceedings of the IEEE 84, 8, 1090--1123. DOI:10.1109/5.533956Google ScholarGoogle ScholarCross RefCross Ref
  26. Corrado Leita, Ken Mermoud, and Marc Dacier. 2005. ScriptGen: An automated script generation tool for HoneyD. In 21st Annual Computer Security Applications Conference (ACSAC’05), 200--214. DOI:10.1109/CSAC.2005.49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Xiangdong Li and Li Chen. 2011. A survey on methods of automatic protocol reverse engineering. In 2011 7th International Conference on Computational Intelligence and Security (CIS), 685--689. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Zhiqiang Lin, Xuxian Jiang, Dongyan Xu, and Xiangyu Zhang. 2008. Automatic protocol format reverse engineering through context-aware monitored execution. In NDSS, 1--15.Google ScholarGoogle Scholar
  29. Zhiqiang Lin, Xiangyu Zhang, and Dongyan Xu. 2010. Reverse engineering input syntactic structure from program execution and its applications. In IEEE Transactions on Software Engineering 36, 5 (2010) 688--703. DOI:10.1109/TSE.2009.54 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Min Liu, Chunfu Jia, Lu Liu, and Zhi Wang. 2013. Extracting sent message formats from executables using backward slicing. In 2013 4th International Conference on Emerging Intelligent Data and Web Technologies (EIDWT), 377--384. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jian-Zhen Luo, and Shun-Zheng Yu. 2013. Position-based automatic reverse engineering of network protocols. Journal of Network and Computer Applications 36, 3 (2013), 1070--1077.Google ScholarGoogle ScholarCross RefCross Ref
  32. Justin Ma, Kirill Levchenko, Christian Kreibich, Stefan Savage, and Geoffrey M. Voelker. 2006. Unexpected means of protocol inference. In 6th ACM SIGCOMM Conference on Internet Measurement (IMC’06). ACM, New York, NY, 313--326. DOI:10.1145/1177080.1177123 http://doi.acm.org/10.1145/1177080.1177123 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. George Mealy. 1955. A method for synthesizing sequential circuits. In Bell System Technical Journal 34, 5 (1955), 1045--1079.Google ScholarGoogle ScholarCross RefCross Ref
  34. Milton Mueller and Asghari Hadi. 2012. Deep packet inspection and bandwidth management: Battles over BitTorrent in Canada and the United States. Telecommunications Policy 36, 6 (2012), 462--475. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Saul Needleman and Christian Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48 (1970), 443--53. DOI:10.1016/0022-2836(70)90057-4Google ScholarGoogle ScholarCross RefCross Ref
  36. Norton.com. 2014. Bots and botnets—A growing threat. Retrieved February 26, 2014 from https://us.norton.com/botnet/promo.Google ScholarGoogle Scholar
  37. Sandip C. Patel, Ganesh D. Bhatt, and James H. Graham. 2009. Improving the cyber security of SCADA communication networks. Communications of the ACM 52, 7 (July 2009), 139--142. DOI:10.1145/1538788.1538820 http://doi.acm.org/10.1145/1538788.1538820 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. PeachFuzzer.com. 2014. Peach Fuzzer Overview. Retrieved February 26, 2014 from http://peachfuzzer.com/pdf/Peach-Overview-DejaVuSecurity-Datasheet-2014.pdf.Google ScholarGoogle Scholar
  39. Christian Rossow and Christian J. Dietrich. 2013. Provex: Detecting botnets with encrypted command and control channels. In Detection of Intrusions and Malware, and Vulnerability Assessment, Lecture Notes in Computer Science, Vol. 7967. Springer, Berlin, 21--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Maxim Shevertalov and Spiros Mancoridis. 2007. A reverse engineering tool for extracting protocols of networked applications. In 14th Working Conference on Reverse Engineering (WCRE’07). 229--238. DOI:10.1109/WCRE.2007.6 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Skype.com. 2014. TLS and SRTP for Skype Connect: Technical Datasheet. Retrieved February 27, 2014 from https://support.skype.com/resources/sites/SKYPE/content/live/DOCUMENTS/0/DO14/en_US/skype-connect-technical-datasheet.pdf.Google ScholarGoogle Scholar
  42. TCPDump/LibPCap. 2010. TCPDump & LibPCap. Retrieved March 19, 2014 from http://www.tcpdump.org/.Google ScholarGoogle Scholar
  43. Naftali Tishby, Fernando Pereira, and William Bialek. 1999. The information bottleneck method. In Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, 368--377.Google ScholarGoogle Scholar
  44. Li Tong, Yuan Liu, Chun-rui Zhang, Fan-zhi Meng, and Yang Yue. 2014. A novel method for delimiting frames of unknown protocol. In 2014 IEEE Workshop on Electronics, Computer and Applications, 552--555.Google ScholarGoogle ScholarCross RefCross Ref
  45. Andrew Tridgell. 2003. How SAMBA Was Written. Retrieved February 26, 2014 from http://www.samba.org/ftp/tridge/misc/french_cafe.txt.Google ScholarGoogle Scholar
  46. Antonio Trifilo, Stefan Burschka, and Ernst Biersack. 2009. Traffic to protocol reverse engineering. In 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, 1--8. DOI:10.1109/CISDA.2009.5356565 Google ScholarGoogle ScholarCross RefCross Ref
  47. Helen J. Wang, Chuanxiong Guo, Daniel R. Simon, and Alf Zugenmaier. 2004. Shield: Vulnerability-driven network filters for preventing known vulnerability exploits. In Proceedings of the 2004 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM’04). ACM, New York, NY, 193--204. DOI:http://dx.doi.org/10.1145/1015467.1015489. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Zhi Wang, Xuxian Jiang, Weidong Cui, Xinyuan Wang, and Mike Grace. 2009. ReFormat: Automatic reverse engineering of encrypted messages. In Computer Security—ESORICS 2009. Springer, Berlin, 200--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Yipeng Wang, Xingjian Li, Jiao Meng, Yong Zhao, Zhibin Zhang, and Li Guo. 2011a. Biprominer: Automatic mining of binary protocol features. In 2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), 179--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Yipeng Wang, Zhibin Zhang, Danfeng Yao, Buyun Qu, and Li Guo. 2011b. Inferring protocol state machine from network traces: A probabilistic approach. Applied Cryptography and Network Security 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Yipeng Wang, XiaoChun Yun, M. Zubair Shafiq, Liyan Wang, Alex X. Liu, Zhibin Zhang, Danfeng Yao, Yong Zheng Zhang, and Li Guo. 2012. A semantics aware approach to automated reverse engineering unknown protocols. In 2012 20th IEEE International Conference on Network Protocols (ICNP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Yong Wang. 2013. Protocol Specification Inference Based on Keywords Identification. Advanced Data Mining and Applications. Springer, Berlin, 443--454. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. T. A. Welch. 1984. A technique for high-performance data compression. Computer 17, 6 (1984), 8--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Wine.org. 2014. About Wine. Retrieved February 26, 2014 from http://www.winehq.org/about/.Google ScholarGoogle Scholar
  55. Gilbert Wondracek, Paolo Milani Comparetti, Christopher Kruegel, and Engin Kirda. 2008. Automatic network protocol analysis. In NDSS, 1--14.Google ScholarGoogle Scholar
  56. Ming-Ming Xiao, Shun-Zheng Yu, and Yu Wang. 2009. Automatic network protocol automaton extraction. In 2009 3rd International Conference on Network and System Security, 336--343. DOI:10.1109/NSS.2009.71 Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Zhao Zhang, Qiao-Yan Wen, and Wen Tang. 2012. Mining protocol state machines by interactive grammar inference. In 2012 3rd International Conference on Digital Manufacturing and Automation (ICDMA), 524--527. DOI:10.1109/ICDMA.2012.125 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Survey of Automatic Protocol Reverse Engineering Tools

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Computing Surveys
        ACM Computing Surveys  Volume 48, Issue 3
        February 2016
        619 pages
        ISSN:0360-0300
        EISSN:1557-7341
        DOI:10.1145/2856149
        • Editor:
        • Sartaj Sahni
        Issue’s Table of Contents

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 December 2015
        • Accepted: 1 September 2015
        • Revised: 1 July 2015
        • Received: 1 August 2014
        Published in csur Volume 48, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • survey
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader