research-article

Data-parallel finite-state machines

Authors:
Todd Mytkowicz

Microsoft Research, Redmond, USA

Microsoft Research, Redmond, USA
View Profile

,
Madanlal Musuvathi

Microsoft Research, Redmond, USA

Microsoft Research, Redmond, USA
View Profile

,
Wolfram Schulte

Microsoft, Redmond, USA

Microsoft, Redmond, USA
View Profile

ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systemsFebruary 2014Pages 529–542https://doi.org/10.1145/2541940.2541988

Published:24 February 2014Publication History

ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Pages 529–542

ABSTRACT

A finite-state machine (FSM) is an important abstraction for solving several problems, including regular-expression matching, tokenizing text, and Huffman decoding. FSM computations typically involve data-dependent iterations with unpredictable memory-access patterns making them difficult to parallelize. This paper describes a parallel algorithm for FSMs that breaks dependences across iterations by efficiently enumerating transitions from all possible states on each input symbol. This allows the algorithm to utilize various sources of data parallelism available on modern hardware, including vector instructions and multiple processors/cores. For instance, on benchmarks from three FSM applications: regular expressions, Huffman decoding, and HTML tokenization, the parallel algorithm achieves up to a 3x speedup over optimized sequential baselines on a single core, and linear speedups up to 21x on 8 cores.

References

G. E. Blelloch. Prefix sums and their applications. Technical Report Carnegie Mellon University-CS-90-190, School of Computer Science, Carnegie Mellon University, Nov. 1990.Google Scholar
J. A. Brzozowski. Canonical regular expressions and minimal state graphs for definite events. Mathematical Theory of Automata, 12: 529--561, 1962.Google Scholar
R. D. Cameron, E. Amiri, K. S. Herdy, D. Lin, T. C. Shermer, and F. Popowich. Parallel scanning with bitstream addition: An XML case study. In European Conference on Parallel and Distributed Computing, Part II, pages 2--13, 2011. Google ScholarDigital Library
B. Catanzaro, N. Sundaram, and K. Keutzer. A map reduce framework for programming graphics processors. In Workshop on Software Tools for MultiCore Systems, 2008.Google Scholar
R. L. Cloud, M. L. Curry, H. L. Ward, A. Skjellum, and P. Bangalore. Accelerating lossless data compression with GPUs. Computing Research Repository (CoRR), abs/1107.1525, 2011.Google Scholar
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Commun. ACM, 51 (1): 107--113, Jan. 2008. Google ScholarDigital Library
B. He, W. Fang, Q. Luo, N. K. Govindaraju, and T. Wang. Mars: A MapReduce framework on graphics processors. In International Conference on Parallel Architectures and Compilation Techniques, PACT '08, pages 260--269, 2008. Google ScholarDigital Library
W. D. Hillis and G. L. Steele. Data parallel algorithms. In Commun. ACM, volume 29, pages 1170--1183, Dec 1986. Google ScholarDigital Library
kr(2009)}Holub:2009J. Holub and S.vStekr. On parallel implementations of deterministic finite automata. In Implementation and Application of Automata, CIAA '09, pages 54--64, 2009. Google ScholarDigital Library
J. E. Hopcroft and J. D. Ullman. Formal languages and their relation to automata. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1969. Google ScholarDigital Library
P. Howard and J. Vitter. Parallel lossless image compression using Huffman and arithmetic coding. In Data Compression Conference, 1992. DCC '92., pages 299 --308, March 1992.Google ScholarCross Ref
013)}haswellIntel Haswell Microarchitecture, 2013. URL http://software.intel.com/en-us/haswell.Google Scholar
C. G. Jones, R. Liu, L. Meyerovich, K. Asanović, and R. Bodík. Parallelizing the web browser. In Hot Topics in Parallelism (HotPar), pages 7--7, 2009. Google ScholarDigital Library
S. T. Klein, Y. Wiseman, S. T. Klein, and Y. Wiseman. Parallel Huffman decoding with applications to JPEG files. The Computer Journal, 46: 487--497, 2003.Google ScholarCross Ref
R. E. Ladner and M. J. Fischer. Parallel prefix computation. Journal of the ACM, 27 (4): 831--838, 1980. Google ScholarDigital Library
}libhuffmanlibhuffman. URL http://huffman.sourceforge.net/.Google Scholar
C.-H. Lin and C.-W. Jen. Low power parallel Huffman decoding. Electronics Letters, 34 (3): 240 --241, Feb 1998.Google ScholarCross Ref
D. Lin, N. Medforth, K. S. Herdy, A. Shriraman, and R. D. Cameron. Parabix: Boosting the efficiency of text processing on commodity processors. In High Performance Computer Architecture (HPCA), pages 373--384, 2012. Google ScholarDigital Library
D. Luchaup, R. Smith, C. Estan, and S. Jha. Speculative parallel pattern matching. IEEE Transactions on Information Forensics and Security, 6 (2): 438--451, June 2011. Google ScholarDigital Library
G. H. Mealy. A method for synthesizing sequential circuits. Bell System Technical Journal, 34 (5): 1045--1079, 1955.Google ScholarCross Ref
G. Navarro. NR-grep: A fast and flexible pattern matching tool. Software: Practice and Experience, 31 (13): 1265--1312, 2001. Google ScholarDigital Library
Y. Pan, Y. Zhang, and K. Chiu. Simultaneous transducers for data-parallel XML parsing. International Symposium on Parallel and Distributed Processing, pages 1--12, 2008.Google Scholar
V. Paxson. flex - fast lexical analyzer generator, 1988.Google Scholar
G. R. Prakash Prabhu and K. Vaswani. Safe programmable speculative parallelism. In Programming Languages Design and Implementation (PLDI), June 2010. Google ScholarDigital Library
}gutenbergProject Gutenberg. URL http://www.gutenberg.org/.Google Scholar
G. Ren, P. Wu, and D. Padua. Optimizing data permutations for SIMD devices. In Programming Languages Design and Implementation (PLDI), pages 118--131, 2006. Google ScholarDigital Library
M. Roesch. Snort - Lightweight intrusion detection for networks. In Proceedings of the 13th USENIX conference on System administration, LISA '99, pages 229--238, 1999. Google ScholarDigital Library
D. P. Scarpazza and G. F. Russell. High-performance regular expression scanning on the Cell/B.E. processor. In International Conf. on Supercomputing, ICS '09, pages 14--25, 2009. Google ScholarDigital Library
S. Sengupta, M. Harris, Y. Zhang, and J. Owens. Scan primitives for GPU computing. In SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, pages 97--106. Eurographics Association, 2007. Google ScholarDigital Library
P. Sutton. Partial character decoding for improved regular expression matching in FPGAs. In Field-Programmable Technology, pages 25 -- 32, Dec. 2004.Google ScholarCross Ref
J. Talbot, R. M. Yoo, and C. Kozyrakis. PhoenixGoogle Scholar
: Modular MapReduce for shared-memory systems. In Workshop on MapReduce and its Applications, pages 9--16, 2011. Google ScholarDigital Library
G. Vasiliadis, M. Polychronakis, S. Antonatos, E. Markatos, and S. Ioannidis. Regular expression matching on graphics hardware for intrusion detection. In Recent Advances in Intrusion Detection, volume 5758, pages 265--283. 2009. Google ScholarDigital Library
B. Wei and T. Meng. A parallel decoder of programmable Huffman codes. Circuits and Systems for Video Technology, 5 (2): 175 --178, Apr 1995. Google ScholarDigital Library
Y.-H. E. Yang, W. Jiang, and V. K. Prasanna. Compact architecture for high-throughput regular expression matching on FPGA. In Architectures for Networking and Communications Systems, ANCS '08, pages 30--39, 2008. Google ScholarDigital Library
F. Yu, Z. Chen, Y. Diao, T. V. Lakshman, and R. H. Katz. Fast and memory-efficient regular expression matching for deep packet inspection. In Architecture for Networking and Communications Systems, ANCS '06, pages 93--102, 2006. Google ScholarDigital Library
Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In Operating Systems Design and Implementation, OSDI'08, pages 1--14, 2008. Google ScholarDigital Library

Index Terms

Data-parallel finite-state machines

Recommendations

Data-parallel finite-state machines
ASPLOS '14

A finite-state machine (FSM) is an important abstraction for solving several problems, including regular-expression matching, tokenizing text, and Huffman decoding. FSM computations typically involve data-dependent iterations with unpredictable memory-...
Read More
Data-parallel finite-state machines
ASPLOS '14

A finite-state machine (FSM) is an important abstraction for solving several problems, including regular-expression matching, tokenizing text, and Huffman decoding. FSM computations typically involve data-dependent iterations with unpredictable memory-...
Read More
Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems
PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Heterogeneous computing on CPUs and GPUs has traditionally used fixed roles for each device: the GPU handles data parallel work by taking advantage of its massive number of cores while the CPU handles non data-parallel work, such as the sequential code ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
February 2014
780 pages
ISBN:9781450323055
DOI:10.1145/2541940
General Chairs:
Rajeev Balasubramonian
University of Utah
,
Al Davis
University of Utah
,
Program Chair:
Sarita Adve
University of Illinois at Urbana-Champ
ACM SIGARCH Computer Architecture News Volume 42, Issue 1
ASPLOS '14
March 2014
729 pages
ISSN:0163-5964
DOI:10.1145/2654822
Editor:
Doug DeGroot
acm dot org
Issue’s Table of Contents
ACM SIGPLAN Notices Volume 49, Issue 4
ASPLOS '14
April 2014
729 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2644865
Editors:
Mark W. Bailey
Hamilton College, Clinton, NY
,
Rajeev Balasubramonian
University of Utah
,
Al Davis
University of Utah
,
Sarita Adve
University of Illinois at Urbana-Champ
Issue’s Table of Contents
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 February 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data parallel
finite state machine
regular expression
Qualifiers
- research-article
Conference

Acceptance Rates
ASPLOS '14 Paper Acceptance Rate49of217submissions,23%Overall Acceptance Rate535of2,713submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 102
  Total Citations
  View Citations
- 867
  Total Downloads
- Downloads (Last 12 months)58
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Data-parallel finite-state machines

ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Data-parallel finite-state machines

Data-parallel finite-state machines

Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems