ABSTRACT
A finite-state machine (FSM) is an important abstraction for solving several problems, including regular-expression matching, tokenizing text, and Huffman decoding. FSM computations typically involve data-dependent iterations with unpredictable memory-access patterns making them difficult to parallelize. This paper describes a parallel algorithm for FSMs that breaks dependences across iterations by efficiently enumerating transitions from all possible states on each input symbol. This allows the algorithm to utilize various sources of data parallelism available on modern hardware, including vector instructions and multiple processors/cores. For instance, on benchmarks from three FSM applications: regular expressions, Huffman decoding, and HTML tokenization, the parallel algorithm achieves up to a 3x speedup over optimized sequential baselines on a single core, and linear speedups up to 21x on 8 cores.
- G. E. Blelloch. Prefix sums and their applications. Technical Report Carnegie Mellon University-CS-90-190, School of Computer Science, Carnegie Mellon University, Nov. 1990.Google Scholar
- J. A. Brzozowski. Canonical regular expressions and minimal state graphs for definite events. Mathematical Theory of Automata, 12: 529--561, 1962.Google Scholar
- R. D. Cameron, E. Amiri, K. S. Herdy, D. Lin, T. C. Shermer, and F. Popowich. Parallel scanning with bitstream addition: An XML case study. In European Conference on Parallel and Distributed Computing, Part II, pages 2--13, 2011. Google ScholarDigital Library
- B. Catanzaro, N. Sundaram, and K. Keutzer. A map reduce framework for programming graphics processors. In Workshop on Software Tools for MultiCore Systems, 2008.Google Scholar
- R. L. Cloud, M. L. Curry, H. L. Ward, A. Skjellum, and P. Bangalore. Accelerating lossless data compression with GPUs. Computing Research Repository (CoRR), abs/1107.1525, 2011.Google Scholar
- J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Commun. ACM, 51 (1): 107--113, Jan. 2008. Google ScholarDigital Library
- B. He, W. Fang, Q. Luo, N. K. Govindaraju, and T. Wang. Mars: A MapReduce framework on graphics processors. In International Conference on Parallel Architectures and Compilation Techniques, PACT '08, pages 260--269, 2008. Google ScholarDigital Library
- W. D. Hillis and G. L. Steele. Data parallel algorithms. In Commun. ACM, volume 29, pages 1170--1183, Dec 1986. Google ScholarDigital Library
- kr(2009)}Holub:2009J. Holub and S.vStekr. On parallel implementations of deterministic finite automata. In Implementation and Application of Automata, CIAA '09, pages 54--64, 2009. Google ScholarDigital Library
- J. E. Hopcroft and J. D. Ullman. Formal languages and their relation to automata. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1969. Google ScholarDigital Library
- P. Howard and J. Vitter. Parallel lossless image compression using Huffman and arithmetic coding. In Data Compression Conference, 1992. DCC '92., pages 299 --308, March 1992.Google ScholarCross Ref
- 013)}haswellIntel Haswell Microarchitecture, 2013. URL http://software.intel.com/en-us/haswell.Google Scholar
- C. G. Jones, R. Liu, L. Meyerovich, K. Asanović, and R. Bodík. Parallelizing the web browser. In Hot Topics in Parallelism (HotPar), pages 7--7, 2009. Google ScholarDigital Library
- S. T. Klein, Y. Wiseman, S. T. Klein, and Y. Wiseman. Parallel Huffman decoding with applications to JPEG files. The Computer Journal, 46: 487--497, 2003.Google ScholarCross Ref
- R. E. Ladner and M. J. Fischer. Parallel prefix computation. Journal of the ACM, 27 (4): 831--838, 1980. Google ScholarDigital Library
- }libhuffmanlibhuffman. URL http://huffman.sourceforge.net/.Google Scholar
- C.-H. Lin and C.-W. Jen. Low power parallel Huffman decoding. Electronics Letters, 34 (3): 240 --241, Feb 1998.Google ScholarCross Ref
- D. Lin, N. Medforth, K. S. Herdy, A. Shriraman, and R. D. Cameron. Parabix: Boosting the efficiency of text processing on commodity processors. In High Performance Computer Architecture (HPCA), pages 373--384, 2012. Google ScholarDigital Library
- D. Luchaup, R. Smith, C. Estan, and S. Jha. Speculative parallel pattern matching. IEEE Transactions on Information Forensics and Security, 6 (2): 438--451, June 2011. Google ScholarDigital Library
- G. H. Mealy. A method for synthesizing sequential circuits. Bell System Technical Journal, 34 (5): 1045--1079, 1955.Google ScholarCross Ref
- G. Navarro. NR-grep: A fast and flexible pattern matching tool. Software: Practice and Experience, 31 (13): 1265--1312, 2001. Google ScholarDigital Library
- Y. Pan, Y. Zhang, and K. Chiu. Simultaneous transducers for data-parallel XML parsing. International Symposium on Parallel and Distributed Processing, pages 1--12, 2008.Google Scholar
- V. Paxson. flex - fast lexical analyzer generator, 1988.Google Scholar
- G. R. Prakash Prabhu and K. Vaswani. Safe programmable speculative parallelism. In Programming Languages Design and Implementation (PLDI), June 2010. Google ScholarDigital Library
- }gutenbergProject Gutenberg. URL http://www.gutenberg.org/.Google Scholar
- G. Ren, P. Wu, and D. Padua. Optimizing data permutations for SIMD devices. In Programming Languages Design and Implementation (PLDI), pages 118--131, 2006. Google ScholarDigital Library
- M. Roesch. Snort - Lightweight intrusion detection for networks. In Proceedings of the 13th USENIX conference on System administration, LISA '99, pages 229--238, 1999. Google ScholarDigital Library
- D. P. Scarpazza and G. F. Russell. High-performance regular expression scanning on the Cell/B.E. processor. In International Conf. on Supercomputing, ICS '09, pages 14--25, 2009. Google ScholarDigital Library
- S. Sengupta, M. Harris, Y. Zhang, and J. Owens. Scan primitives for GPU computing. In SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, pages 97--106. Eurographics Association, 2007. Google ScholarDigital Library
- P. Sutton. Partial character decoding for improved regular expression matching in FPGAs. In Field-Programmable Technology, pages 25 -- 32, Dec. 2004.Google ScholarCross Ref
- J. Talbot, R. M. Yoo, and C. Kozyrakis. PhoenixGoogle Scholar
- : Modular MapReduce for shared-memory systems. In Workshop on MapReduce and its Applications, pages 9--16, 2011. Google ScholarDigital Library
- G. Vasiliadis, M. Polychronakis, S. Antonatos, E. Markatos, and S. Ioannidis. Regular expression matching on graphics hardware for intrusion detection. In Recent Advances in Intrusion Detection, volume 5758, pages 265--283. 2009. Google ScholarDigital Library
- B. Wei and T. Meng. A parallel decoder of programmable Huffman codes. Circuits and Systems for Video Technology, 5 (2): 175 --178, Apr 1995. Google ScholarDigital Library
- Y.-H. E. Yang, W. Jiang, and V. K. Prasanna. Compact architecture for high-throughput regular expression matching on FPGA. In Architectures for Networking and Communications Systems, ANCS '08, pages 30--39, 2008. Google ScholarDigital Library
- F. Yu, Z. Chen, Y. Diao, T. V. Lakshman, and R. H. Katz. Fast and memory-efficient regular expression matching for deep packet inspection. In Architecture for Networking and Communications Systems, ANCS '06, pages 93--102, 2006. Google ScholarDigital Library
- Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In Operating Systems Design and Implementation, OSDI'08, pages 1--14, 2008. Google ScholarDigital Library
Index Terms
- Data-parallel finite-state machines
Recommendations
Data-parallel finite-state machines
ASPLOS '14A finite-state machine (FSM) is an important abstraction for solving several problems, including regular-expression matching, tokenizing text, and Huffman decoding. FSM computations typically involve data-dependent iterations with unpredictable memory-...
Data-parallel finite-state machines
ASPLOS '14A finite-state machine (FSM) is an important abstraction for solving several problems, including regular-expression matching, tokenizing text, and Huffman decoding. FSM computations typically involve data-dependent iterations with unpredictable memory-...
Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems
PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniquesHeterogeneous computing on CPUs and GPUs has traditionally used fixed roles for each device: the GPU handles data parallel work by taking advantage of its massive number of cores while the CPU handles non data-parallel work, such as the sequential code ...
Comments