ABSTRACT
Regular expression matching is the heart of many data processing routines, such as string search, network traffic filtering, etc. The traditional way of regexp matching is building and execution of a deterministic finite automaton (DFA), that provides O(1) processing time per 1 input symbol for any regular expression. But this technique almost always forces many modern SIMD-processors to perform regexp search in scalar mode, thus it doesn't use the most part of their computational power.
This paper represents traditional straightforward DFA along with another regexp implementation, based on nondeterministic finite automata (NFA) SIMD-simulation on Cell Broadband Engine processor. Software implementation of NFA-based SIMD algorithm achieves as much as 10 Gbit/s per one Cell BE processor with 512 NFA states, thus it is feasible for preliminary network traffic filtering of suspicious objects, while DFA-based scalar one gains up to 60 Gbit/s with 60-state automaton. One Cell BE processor can maintain NFA of 6000..7000 overall states simultaneously, so if one wants to use signatures with more that 512 states, it's possible with linear performance-to-signatures tradeoff.
- Aho A. V., Corasick M. J. 1975 Efficient string matching: An aid to bibliographic search, Comm. of ACM, 18(6):333--340, 1975. Google ScholarDigital Library
- Thompson K. 1968 Regular expression search algorithm. Communications of the ACM 11(6) (June 1968), pp. 419--422. Google ScholarDigital Library
- AbuHmed T. 2007 Deep Packet Inspection for Intrusion Detection Systems: A Survey. Technical Report. Information Security Research Laboratory, Inha University, Incheon 402--751, Korea.Google Scholar
- Cox R. 2007 Regular Expression Matching Can Be Simple And Fast. http://swtch.com/~rsc/regexp/Google Scholar
- Kumar S., Yu F., Crowley P., Turner J. 2006. Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection. In proc. of SIGCOMM, 2006. Google ScholarDigital Library
- Kumar S., Turner J., Williams J. 2006 Advanced Algorithms for Fast and Scalable Deep Packet Inspection. In proc. of ANCS-2006, San Jose, California, USA. Google ScholarDigital Library
- Kumar S., Turner J., Varghese G. 2007 Curing Regular Expressions Matching Algorithms from Insomnia. Amnesia, and Acalculia. In proc. of ANCS, 2007. Google ScholarDigital Library
- Smith R., Estan C., Jha S. 2008 XFA: Faster Signature Matching With Extended Automata. In IEEE Simposium on Security and Privacy, 2008. Google ScholarDigital Library
- Goyal N., Ormont J., Smith R., Estan C. 2008 Signature Matching in Network Processing Using SIMD/GPU Architectures. Technical Report #1628. University of Wisconsin, Madison.Google Scholar
- Ficara D., Giordano S., Procissi G. 2008 An Improved DFA for Fast Regular Expression Matching.--ACM SIGCOMM Computer Communication Review, Vol. 38, Number 5, 2008 Google ScholarDigital Library
- 2007 Cell Broadband Engine Programming Handbook (version 1.1). www.ibm.comGoogle Scholar
- 2007 Synergistic Processor Unit Instruction Set Architecture (version1.2). www.ibm.comGoogle Scholar
- Arevalo A. et al. 2008 Programming the Cell Broadband Engine Examples and Best Practices. IBM Redbook SG24-7575-00. www.ibm.comGoogle Scholar
- Yu F., Chen Z., Diao Y. 2006 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection. Technical report, University of California at BerkeleyGoogle Scholar
- Aho A. V., Sethi, R., Ullman, J. D. "Compilers : Principles, Techniques, and Tools". Addison-Wesley, 1985. Google ScholarDigital Library
- Scarpazza D. P., Villa O., Petrini F. 2008 Exact Multi-pattern String Matching on the Cell/B.E. Processor. CF'08, May 5--7, 2008, Ischia, Italy Google ScholarDigital Library
- Navarro G. 2003 Pattern Matching. Technical Report. Department of Computer Science, University of Chile.Google Scholar
- Baeza-Yates R., Gonnet R. 1992 A new approach to text searching. Communications of the ACM, 35(10):74{82), 1992 Google ScholarDigital Library
- Snort--the open source IDS, www.snort.orgGoogle Scholar
Index Terms
- DFA-based and SIMD NFA-based regular expression matching on cell BE for fast network traffic filtering
Recommendations
An improved DFA for fast regular expression matching
Modern network devices need to perform deep packet inspection at high speed for security and application-specific services. Finite Automata (FAs) are used to implement regular expressions matching, but they require a large amount of memory. Many recent ...
A regular expression matching circuit: Decomposed non-deterministic realization with prefix sharing and multi-character transition
This paper shows a compact realization of regular expression matching circuits on FPGAs. First, the given regular expression is converted into a non-deterministic finite automaton (NFA) by the modified McNaughton-Yamada method. Second, to reduce the ...
Fast, memory-efficient regular expression matching with NFA-OBDDs
Modern network intrusion detection systems (NIDS) employ regular expressions as attack signatures. Internally, NIDS represent and operate these regular expressions as finite automata. However, finite automata present a well-known time/space tradeoff. ...
Comments