Abstract
We present a general automata processing framework on FPGAs, which generates an RTL kernel for automata processing together with an AXI and PCIe based I/O circuitry. We implement the framework on both local nodes and cloud platforms (Amazon AWS and Nimbix) with novel features. A full performance comparison of the proposed framework is conducted against state-of-the-art automata processing engines on CPUs, GPUs, and Micron’s Automata Processor using the ANMLZoo benchmark suite and some real-world datasets. Results show that FPGAs enable extremely high-throughput automata processing compared to von Neumann architectures. We also collect the resource utilization and power consumption on the two cloud platforms, and find that the I/O circuitry consumes most of the hardware resources and power. Furthermore, we propose a fast, symbol-only reconfiguration mechanism based on the framework for large pattern sets that cannot fit on a single device and need to be partitioned. The proposed method supports multiple passes of the input stream and reduces the re-compilation cost from hours to seconds.
- Kevin Angstadt, Jack Wadden, Vinh Dang, Ted Xie, Dan Kramp, Westley Weimer, Mircea Stan, and Kevin Skadron. 2018. MNCaRT: An open-source, multi-architecture automata-processing research and execution ecosystem. IEEE Computer Architecture Letters 17, 1 (2018), 84--87. Google ScholarDigital Library
- Kevin Angstadt, Westley Weimer, and Kevin Skadron. 2016. RAPID programming of pattern-recognition processors. ACM SIGOPS Operating Systems Review 50, 2 (2016), 593--605. Google ScholarDigital Library
- AWS. 2017. Amazon EC2 F1 Instances. Retrieved October 2017 from https://aws.amazon.com/ec2/instance-types/f1/.Google Scholar
- AWS-FPGA. 2017. Official Repository of the AWS EC2 FPGA Hardware and Software Development Kit. Retrieved October 2017 from https://github.com/aws/aws-fpga.Google Scholar
- Michela Becci and Patrick Crowley. 2007. A hybrid finite automaton for practical deep packet inspection. In ACM CoNEXT Conference. Google ScholarDigital Library
- Chunkun Bo. 2018a. Identify gRNA Off-Targets for CRISPR/Cas9 using Automata Processing. Retrieved February 2018a from https://github.com/chunkunbo/CRISPR.Google Scholar
- Chunkun Bo. 2018b. REAPR on F1. Retrieved January 2018b from https://github.com/chunkunbo/REARP-on-Amazon-F1.Google Scholar
- Chunkun Bo, Dang Vinh, Elaheh Sadredini, and Kevin Skadron. 2018. Searching for potential gRNA off-target sites for CRISPR/Cas9 using automata processing across different platforms. In Proceedings of IEEE International Symposium on High Performance Computer Architecture (HPCA’18).Google ScholarCross Ref
- Chunkun Bo, Ke Wang, Jefferey J. Fox, and Kevin Skadron. 2015a. Entity resolution acceleration using Micron’s automata processor. In Proceedings of Architectures and Systems for Big Data (ASBD’15), in conjunction with ISCA (2015).Google Scholar
- Chunkun Bo, Ke Wang, Jefferey J. Fox, and Kevin Skadron. 2016. Entity resolution acceleration using the automata processor. In Proceedings of IEEE International Conference on Big Data (Big Data’16).Google ScholarCross Ref
- Chunkun Bo, Ke Wang, Yanjun Qi, and Kevin Skadron. 2015b. String kernel testing acceleration using the Micron automata processor. In Proceedings of Workshop on Computer Architecture for Machine Learning.Google Scholar
- Pascal Caron and Djelloul Ziadi. 2000. Characterization of Glushkov automata. Theoretical Computer Science 233, 1--2 (2000), 75--90. Google ScholarDigital Library
- Niccolo Cascarano, Pierluigi Rolando, Fulvio Risso, and Riccardo Sisto. 2010. iNFAnt: NFA pattern matching on GPGPU devices. ACM SIGCOMM Computer Communication Review 40, 5 (2010), 20--26. Google ScholarDigital Library
- Vinh Dang. 2017a. A Deterministic Finite Automata GPU-Based Engine. Retrieved May 2017a from https://github.com/vqd8a/DFAGE.Google Scholar
- Vinh Dang. 2017b. iNFAnt2. Retrieved May 2017b from https://github.com/vqd8a/iNFAnt2.Google Scholar
- Paul Dlugosch, Dave Brown, Paul Glendenning, Michael Leventhal, and Harold Noyes. 2014. An efficient and scalable semiconductor architecture for parallel automata processing. In Proceedings of IEEE Transactions on Parallel and Distributed Systems.Google ScholarCross Ref
- Yuanwei Fang, Tung T. Hoang, Michela Becci, and Andrew A. Chien. 2015. HARE: Hardware accelerator for regular expressions. In Proceedings of the 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’15). Google ScholarDigital Library
- Vaibhav Gogte, Aasheesh Kolli, Michael J. Cafarella, Loris D’Antoni, and Thomas F. Wenisch. 2016. HARE: Hardware accelerator for regular expressions. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). Google ScholarDigital Library
- Tran Trung Hieu and Ngoc Thinh Tran. 2013. A memory efficient FPGA-based pattern matching engine for stateful NIDS. In Proceedings of the 5th Annual IEEE Conference Ubiquitous and Future Networks (ICUFN’13).Google Scholar
- Xilinx Inc. 2018. SDAccel Development Environment. Retrieved January 2018 from https://www.xilinx.com/products/design-tools/softwarezone/sdaccel.html.Google Scholar
- Intel. 2018. High-Performance Regular Expression Matching Library. Retrieved January 2018 from https://github.com/intel/hyperscan.Google Scholar
- Rasha Karakchi, Lothrop O. Richards, and Jason D. Bakos. 2017. A dynamically reconfigurable automata processor overlay. In Proceedings of International Conference on ReConFigurable Computing and FPGAs (ReConFig’17).Google Scholar
- Micron. 2015. ANML Documentation. Retrieved August 2015 from http://www.micronautomata.com/documentation/anml_documentation/c_intro.html.Google Scholar
- Roger Moussalli, Mariam Salloum, Robert Halstead, Walid Najjar, and Vassilis J. Tsotras. 2014. A study on parallelizing XML path filtering using accelerators. ACM Transactions on Embedded Computing Systems (TECS) 13, 4 (2014), 93. Google ScholarDigital Library
- Nimbix. 2017. Xilinx FPGAs on the Nimbix Cloud. Retrieved October 2017 from https://www.nimbix.net/xilinx/.Google Scholar
- Marziyeh Nourian, Xiang Wang, Xiaodong Yu, Wu chun Feng, and Michela Becchi. 2017. Demystifying automata processing: GPUs, FPGAs or Micron’s AP? In Proceedings of the International Conference on Supercomputing. ACM, 1. Google ScholarDigital Library
- Elaheh Sadredini, Deyuan Guo, Chunkun Bo, Reza Rahimi, and Kevin Skadron. 2018. A scalable solution for rule-based part-of-speech tagging on novel hardware accelerators. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery 8 Data Mining (KDD’18). Google ScholarDigital Library
- Reetinder Sidhu and Viktor Prosanna. 2001. Fast regular expression matching using FPGAs. In Proceedings of the 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’01). Google ScholarDigital Library
- Jens Teubner, Louis Woods, and Chongling Nie. 2012. Skeleton automata for FPGAs: Reconfiguring without reconstructing. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, 229--240. Google ScholarDigital Library
- Jack Wadden, Kevin Angstadt, and Kevin Skadron. 2018a. Characterizing and mitigating output reporting bottlenecks in spatial automata processing architectures. In Proceedings of IEEE International Symposium on High Performance Computer Architecture (HPCA’18).Google ScholarCross Ref
- Jack Wadden, Tommy Tracy II, Elaheh Sadredini, Lingxi Wu, Chunkun Bo, Jesse Du, Yizhou Wei, Matthew Wallace, Jeffrey Udall, Mircea Stan, and Kevin Skadron. 2018b. AutomataZoo: A modern automata processing benchmark suite. In Proceedings of IEEE International Symposium on Workload Characterization (IISWC’18).Google ScholarCross Ref
- Jack Wadden and Kevin Skadron. 2016. VASim: An Open Virtual Automata Simulator for Automata Processing Application and Architecture Research. Technical Report CS2016--03, University of Virginia.Google Scholar
- Jack Wadden, Dang Vinh, Nathan Brunelle, Tommy Tracy II, Deyuan Guo, Elaheh Sadredini, Ke Wang, Chunkun Bo, Robins Gabriel, Mircea Stan, and Kevin Skadron. 2017. ANMLzoo: A benchmark suite for exploring bottlenecks in automata processing engines and architectures. In Proceedings of IEEE International Symposium on Workload Characterization (IISWC’17).Google Scholar
- Ke Wang, Kevin Angstadt, Chunkun Bo, Nathan Brunelle, Elaheh Sadredini, Tommy Tracy, Jack Wadden, Mircea Stan, and Kevin Skadron. 2016a. An overview of Micron’s automata processor. In Proceedings of Workshop on Hardware/Software Codesign and System Synthesis (CODES+ ISSS’16). Google ScholarDigital Library
- Ke Wang, Yanjun Qi, Jefferey J. Fox, Mircea R. Stan, and Kevin Skadron. 2015. Association rule mining with the Micron automata processor. In Proceedings of IEEE International Conference on Parallel and Distributed Processing Symposium (IPDPS’15). Google ScholarDigital Library
- Ke Wang, Elaheh Sadredini, and Kevin Skadron. 2016b. Sequential pattern mining with the Micron automata processor. In Proceedings of the ACM International Conference on Computing Frontiers. ACM, 135--144. Google ScholarDigital Library
- Ted Xie, Vinh Dang, Chunkun Bo, Jack Wadden, Kevin Skadron, and Mircea Stan. 2018. REAPR: Reconfigurable Engine for Automata Processing. Retrieved August 2018 from https://github.com/ted-xie/REAPR.Google Scholar
- Ted Xie, Dang Vinh, Jack Wadden, Mircea Stan, and Kevin Skadron. 2017. REAPR: Reconfigurable engine for automata processing. In Proceedings of IEEE International Conference on Field Programmable Logic and Applications (FPL’17).Google ScholarCross Ref
- Yi-Hua Yang and Viktor Prasanna. 2012. High-performance and compact architecture for regular expression matching on FPGA. IEEE Transactions on Computers 61, 7 (2012), 1013--1025. Google ScholarDigital Library
- Xiaodong Yu, Kaixi Hou, Hao Wang, and Wu chun Feng. 2017. Robotomata: A framework for approximate pattern matching of big data on an automata processor. In Proceedings of the 2017 IEEE International Conference on Big Data (Big Data’17). IEEE, 283--292.Google ScholarCross Ref
- Keira Zhou, Jeffrey J. Fox, Ke Wang, Donald E. Brown, and Kevin Skadron. 2015. Brill tagging on the Micron automata processor. In Proceedings of IEEE International Conference on Semantic Computing (ICSC’15).Google ScholarCross Ref
Index Terms
- Automata Processing in Reconfigurable Architectures: In-the-Cloud Deployment, Cross-Platform Evaluation, and Fast Symbol-Only Reconfiguration
Recommendations
Demystifying automata processing: GPUs, FPGAs or Micron's AP?
ICS '17: Proceedings of the International Conference on SupercomputingMany established and emerging applications perform at their core some form of pattern matching, a computation that maps naturally onto finite automata abstractions. As a consequence, in recent years there has been a substantial amount of work on high-...
Partitioning signal processing applications to different granularity reconfigurable logic
SSIP'05: Proceedings of the 5th WSEAS international conference on Signal, speech and image processingIn this paper, we propose a methodology for partitioning DSP applications between the fine and coarse-grain reconfigurable hardware for improving performance. The fine-grain logic is implemented by an embedded FPGA unit, while for the coarse-grain ...
A method for partitioning applications in hybrid reconfigurable architectures
In this paper, we propose a methodology for accelerating application segments by partitioning them between reconfigurable hardware blocks of different granularity. Critical parts are speeded-up on the coarse-grain reconfigurable hardware for meeting the ...
Comments