skip to main content
10.1145/2541940.2541988acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Data-parallel finite-state machines

Published:24 February 2014Publication History

ABSTRACT

A finite-state machine (FSM) is an important abstraction for solving several problems, including regular-expression matching, tokenizing text, and Huffman decoding. FSM computations typically involve data-dependent iterations with unpredictable memory-access patterns making them difficult to parallelize. This paper describes a parallel algorithm for FSMs that breaks dependences across iterations by efficiently enumerating transitions from all possible states on each input symbol. This allows the algorithm to utilize various sources of data parallelism available on modern hardware, including vector instructions and multiple processors/cores. For instance, on benchmarks from three FSM applications: regular expressions, Huffman decoding, and HTML tokenization, the parallel algorithm achieves up to a 3x speedup over optimized sequential baselines on a single core, and linear speedups up to 21x on 8 cores.

References

  1. G. E. Blelloch. Prefix sums and their applications. Technical Report Carnegie Mellon University-CS-90-190, School of Computer Science, Carnegie Mellon University, Nov. 1990.Google ScholarGoogle Scholar
  2. J. A. Brzozowski. Canonical regular expressions and minimal state graphs for definite events. Mathematical Theory of Automata, 12: 529--561, 1962.Google ScholarGoogle Scholar
  3. R. D. Cameron, E. Amiri, K. S. Herdy, D. Lin, T. C. Shermer, and F. Popowich. Parallel scanning with bitstream addition: An XML case study. In European Conference on Parallel and Distributed Computing, Part II, pages 2--13, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Catanzaro, N. Sundaram, and K. Keutzer. A map reduce framework for programming graphics processors. In Workshop on Software Tools for MultiCore Systems, 2008.Google ScholarGoogle Scholar
  5. R. L. Cloud, M. L. Curry, H. L. Ward, A. Skjellum, and P. Bangalore. Accelerating lossless data compression with GPUs. Computing Research Repository (CoRR), abs/1107.1525, 2011.Google ScholarGoogle Scholar
  6. J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Commun. ACM, 51 (1): 107--113, Jan. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. He, W. Fang, Q. Luo, N. K. Govindaraju, and T. Wang. Mars: A MapReduce framework on graphics processors. In International Conference on Parallel Architectures and Compilation Techniques, PACT '08, pages 260--269, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. W. D. Hillis and G. L. Steele. Data parallel algorithms. In Commun. ACM, volume 29, pages 1170--1183, Dec 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. kr(2009)}Holub:2009J. Holub and S.vStekr. On parallel implementations of deterministic finite automata. In Implementation and Application of Automata, CIAA '09, pages 54--64, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. E. Hopcroft and J. D. Ullman. Formal languages and their relation to automata. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1969. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Howard and J. Vitter. Parallel lossless image compression using Huffman and arithmetic coding. In Data Compression Conference, 1992. DCC '92., pages 299 --308, March 1992.Google ScholarGoogle ScholarCross RefCross Ref
  12. 013)}haswellIntel Haswell Microarchitecture, 2013. URL http://software.intel.com/en-us/haswell.Google ScholarGoogle Scholar
  13. C. G. Jones, R. Liu, L. Meyerovich, K. Asanović, and R. Bodík. Parallelizing the web browser. In Hot Topics in Parallelism (HotPar), pages 7--7, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. T. Klein, Y. Wiseman, S. T. Klein, and Y. Wiseman. Parallel Huffman decoding with applications to JPEG files. The Computer Journal, 46: 487--497, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  15. R. E. Ladner and M. J. Fischer. Parallel prefix computation. Journal of the ACM, 27 (4): 831--838, 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. }libhuffmanlibhuffman. URL http://huffman.sourceforge.net/.Google ScholarGoogle Scholar
  17. C.-H. Lin and C.-W. Jen. Low power parallel Huffman decoding. Electronics Letters, 34 (3): 240 --241, Feb 1998.Google ScholarGoogle ScholarCross RefCross Ref
  18. D. Lin, N. Medforth, K. S. Herdy, A. Shriraman, and R. D. Cameron. Parabix: Boosting the efficiency of text processing on commodity processors. In High Performance Computer Architecture (HPCA), pages 373--384, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Luchaup, R. Smith, C. Estan, and S. Jha. Speculative parallel pattern matching. IEEE Transactions on Information Forensics and Security, 6 (2): 438--451, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. H. Mealy. A method for synthesizing sequential circuits. Bell System Technical Journal, 34 (5): 1045--1079, 1955.Google ScholarGoogle ScholarCross RefCross Ref
  21. G. Navarro. NR-grep: A fast and flexible pattern matching tool. Software: Practice and Experience, 31 (13): 1265--1312, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Pan, Y. Zhang, and K. Chiu. Simultaneous transducers for data-parallel XML parsing. International Symposium on Parallel and Distributed Processing, pages 1--12, 2008.Google ScholarGoogle Scholar
  23. V. Paxson. flex - fast lexical analyzer generator, 1988.Google ScholarGoogle Scholar
  24. G. R. Prakash Prabhu and K. Vaswani. Safe programmable speculative parallelism. In Programming Languages Design and Implementation (PLDI), June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. }gutenbergProject Gutenberg. URL http://www.gutenberg.org/.Google ScholarGoogle Scholar
  26. G. Ren, P. Wu, and D. Padua. Optimizing data permutations for SIMD devices. In Programming Languages Design and Implementation (PLDI), pages 118--131, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Roesch. Snort - Lightweight intrusion detection for networks. In Proceedings of the 13th USENIX conference on System administration, LISA '99, pages 229--238, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. P. Scarpazza and G. F. Russell. High-performance regular expression scanning on the Cell/B.E. processor. In International Conf. on Supercomputing, ICS '09, pages 14--25, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Sengupta, M. Harris, Y. Zhang, and J. Owens. Scan primitives for GPU computing. In SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, pages 97--106. Eurographics Association, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. P. Sutton. Partial character decoding for improved regular expression matching in FPGAs. In Field-Programmable Technology, pages 25 -- 32, Dec. 2004.Google ScholarGoogle ScholarCross RefCross Ref
  31. J. Talbot, R. M. Yoo, and C. Kozyrakis. PhoenixGoogle ScholarGoogle Scholar
  32. : Modular MapReduce for shared-memory systems. In Workshop on MapReduce and its Applications, pages 9--16, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. G. Vasiliadis, M. Polychronakis, S. Antonatos, E. Markatos, and S. Ioannidis. Regular expression matching on graphics hardware for intrusion detection. In Recent Advances in Intrusion Detection, volume 5758, pages 265--283. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. B. Wei and T. Meng. A parallel decoder of programmable Huffman codes. Circuits and Systems for Video Technology, 5 (2): 175 --178, Apr 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Y.-H. E. Yang, W. Jiang, and V. K. Prasanna. Compact architecture for high-throughput regular expression matching on FPGA. In Architectures for Networking and Communications Systems, ANCS '08, pages 30--39, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. F. Yu, Z. Chen, Y. Diao, T. V. Lakshman, and R. H. Katz. Fast and memory-efficient regular expression matching for deep packet inspection. In Architecture for Networking and Communications Systems, ANCS '06, pages 93--102, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In Operating Systems Design and Implementation, OSDI'08, pages 1--14, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Data-parallel finite-state machines

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
          February 2014
          780 pages
          ISBN:9781450323055
          DOI:10.1145/2541940

          Copyright © 2014 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 24 February 2014

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          ASPLOS '14 Paper Acceptance Rate49of217submissions,23%Overall Acceptance Rate535of2,713submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader