ABSTRACT
We introduce BriskStream, an in-memory data stream processing system (DSPSs) specifically designed for modern shared-memory multicore architectures. BriskStream's key contribution is an execution plan optimization paradigm, namely RLAS, which takes relative-location (i.e., NUMA distance) of each pair of producer-consumer operators into consideration. We propose a branch and bound based approach with three heuristics to resolve the resulting nontrivial optimization problem. The experimental evaluations demonstrate that BriskStream yields much higher throughput and better scalability than existing DSPSs on multi-core architectures when processing different types of workloads.
- 2008. Classmexer. https://www.javamex.com/classmexer/Google Scholar
- 2015. SGI UV 300 UV300EX Data Sheet, http://www.newroute.com/ upload/updocumentos/a06ee4637786915bc954e850a6b5580f.pdf.Google Scholar
- 2017. NUMA patch for Flink, https://issues.apache.org/jira/browse/ FLINK-3163.Google Scholar
- 2018. Apache Flink. https://flink.apache.org/Google Scholar
- 2018. Apache Storm. http://storm.apache.org/Google Scholar
- 2018. HP ProLiant DL980 G7 server with HP PREMA Architecture Technical Overview. https://community.hpe.com/hpeb/attachments/ hpeb/itrc-264/106801/1/363896.pdfGoogle Scholar
- 2018. Intel Memory Latency Checker, https://software.intel.com/ articles/intelr-memory-latency-checker.Google Scholar
- 2018. Intel VTune Amplifier. https://software.intel.com/en-us/ intel-vtune-amplifier-xe.Google Scholar
- 2018. OpenHFT. https://github.com/OpenHFT/Java-Thread-AffinityGoogle Scholar
- Daniel J. Abadi, Yanif Ahmad, Magdalena Balazinska, Ugur Cetintemel, Mitch Cherniack, Jeong H. Hwang, Wolfgang Lindner, Anurag S. Maskey, Alexander Rasin, Esther Ryvkina, Nesime Tatbul, Ying Xing, and Stan Zdonik. 2005. The Design of the Borealis Stream Processing Engine. In CIDR.Google Scholar
- Anastassia Ailamaki, David J. DeWitt, Mark D. Hill, and David A. Wood. 2009. DBMSs on a Modern Processor: Where Does Time Go?. In VLDB. Google ScholarDigital Library
- M. Akdere, U. Çetintemel, M. Riondato, E. Upfal, and S. B. Zdonik. 2012. Learning-based Query Performance Modeling and Prediction. In ICDE. Google ScholarDigital Library
- Leonardo Aniello, Roberto Baldoni, and Leonardo Querzoni. 2013. Adaptive Online Scheduling in Storm. In DEBS. Google ScholarDigital Library
- Raja Appuswamy, Christos Gkantsidis, Dushyanth Narayanan, Orion Hodson, and Antony Rowstron. 2013. Scale-up vs scale-out for hadoop: Time to rethink?. In SoCC. Google ScholarDigital Library
- C. Balkesen, J. Teubner, G. Alonso, and M. T. Özsu. 2013. Main-Memory Hash Joins on Multi-Core CPUs: Tuning to the Underlying Hardware. In ICDE. Google ScholarDigital Library
- Peter A. Boncz, Stefan Manegold, and Martin L. Kersten. 1999. Database Architecture Optimized for the new. Bottleneck: Memory Access. In VLDB.Google ScholarDigital Library
- Surendra Byna, Xian He Sun, William Gropp, and Rajeev Thakur. 2004. Predicting memory-access cost based on data-access patterns. In ICCC. Google ScholarDigital Library
- Valeria Cardellini, Vincenzo Grassi, Francesco Lo Presti, and Matteo Nardelli. 2016. Optimal Operator Placement for Distributed Stream Processing Applications. In DEBS. Google ScholarDigital Library
- Valeria Cardellini, Vincenzo Grassi, Francesco Lo Presti, and Matteo Nardelli. 2017. Optimal Operator Replication and Placement for Distributed Stream Processing Systems. In SIGMETRICS Perform. Eval. Rev. Google ScholarDigital Library
- V. Cardellini, M. Nardelli, and D. Luzi. 2016. Elastic Stateful Stream Processing in Storm. In HPCS.Google Scholar
- Sirish Chandrasekaran and et al. 2003. TelegraphCQ: Continuous dataflow processing for an uncertain world. In CIDR.Google Scholar
- Xuntao Cheng, Bingsheng He, Xiaoli Du, and Chiew Tong Lau. 2017. A Study of Main-Memory Hash Joins on Many-core Processor: A Case with Intel Knights Landing Architecture. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM '17). Google ScholarDigital Library
- Alvin Cheung, Owen Arden, Samuel Madden, and Andrew C. Myers. 2013. Speeding up database applications with Pyxis. In SIGMOD. Google ScholarDigital Library
- Tathagata Das, Yuan Zhong, Ion Stoica, and Scott Shenker. 2014. Adaptive Stream Processing usingDynamic Batch Sizing. SOCC (2014). Google ScholarDigital Library
- Raul Castro Fernandez, Matteo Migliavacca, Evangelia Kalyvianaki, and Peter R. Pietzuch. 2013. Integrating scale out and fault tolerance in stream processing using operator state management. In SIGMOD. Google ScholarDigital Library
- B. Gedik, S. Schneider, M. Hirzel, and K. L. Wu. 2014. Elastic Scaling for Data Stream Processing. TPDS (2014). Google ScholarDigital Library
- Jana Giceva, Gustavo Alonso, Timothy Roscoe, and Tim Harris. 2014. Deployment of Query Plans on Multicores. In VLDB. Google ScholarDigital Library
- Stavros Harizopoulos and Anastassia Ailamaki. 2006. Improving Instruction Cache Performance in OLTP. TODS (2006). Google ScholarDigital Library
- Bingsheng He, Qiong Luo, and B. Choi. 2005. Cache-conscious automata for XML filtering. In ICDE. Google ScholarDigital Library
- Martin Hirzel, Robert Soulé, Scott Schneider, Bura Gedik, and Robert Grimm. 2014. A Catalog of Stream Processing Optimizations. ACM Comput. Surv. (2014).Google Scholar
- C. Iancu, S. Hofmeyr, F. Blagojevic, and Y. Zheng. 2010. In IPDPS.Google Scholar
- Navendu Jain, Lisa Amini, Henrique Andrade, Richard King, Yoonho Park, Philippe Selo, and Chitra Venkatramani. 2006. Design, Implementation, and Evaluation of the Linear Road Bnchmark on the Stream Processing Core. In SIGMOD. Google ScholarDigital Library
- Saurabh Jha, Bingsheng He, Mian Lu, Xuntao Cheng, and Huynh Phung Huynh. 2015. Improving Main Memory Hash Joins on Intel Xeon Phi Processors: An Experimental Approach. Proc. VLDB Endow. (2015). Google ScholarDigital Library
- Khandekar, Rohit and Hildrum, Kirsten and Parekh, Sujay and Rajan, Deepak andWolf, Joel andWu, Kun-Lung and Andrade, Henrique and Gedik, Bufra. 2009. COLA: Optimizing Stream Processing Applications via Graph Partitioning. In Middleware. Google ScholarDigital Library
- Alexandros Koliousis, Matthias Weidlich, Raul Castro Fernandez, Alexander L. Wolf, Paolo Costa, and Peter R. Pietzuch. 2016. SABER: Window-Based Hybrid Stream Processing for Heterogeneous Architectures. In SIGMOD. Google ScholarDigital Library
- Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel, Karthikeyan Ramasamy, and Siddarth Taneja. 2015. Twitter Heron: Stream Processing at Scale. In SIGMOD. Google ScholarDigital Library
- Geetika T. Lakshmanan, Ying Li, and Robert E. Strom. 2008. Placement Strategies for Internet-Scale Data Stream Systems. IEEE Internet Computing (2008). Google ScholarDigital Library
- Jaekyu Lee, Hyesoon Kim, and Richard Vuduc. 2012. When Prefetching Works, When It Doesn&Rsquo;T, and Why. ACM Trans. Archit. Code Optim. (2012). Google ScholarDigital Library
- Viktor Leis, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2014. Morsel-driven Parallelism: A NUMA-aware Query Evaluation Framework for the Many-core Age. In SIGMOD. Google ScholarDigital Library
- Yinan Li, Ippokratis Pandis, René Müller, Vijayshankar Raman, and Guy M. Lohman. 2013. NUMA-aware algorithms: the case of data shuffling. In CIDR.Google Scholar
- Devroye Luc and Zamora Cura Carlos. 1999. On the Complexity of Branch-and Bound Search for Random Trees. Random Struct. Algorithms (1999). Google ScholarDigital Library
- Hongyu Miao, Heejin Park, Myeongjae Jeon, Gennady Pekhimenko, Kathryn S. McKinley, and Felix Xiaozhu Lin. 2017. StreamBox: Modern Stream Processing on a Multicore Machine. In ATC. Google ScholarDigital Library
- David R. Morrison, Sheldon H. Jacobson, Jason J. Sauppe, and Edward C. Sewell. 2016. Branch-and-bound algorithms: A survey of recent advances in searching, branching, and pruning. Discrete Optimization (2016). Google ScholarDigital Library
- Boyang Peng, Mohammad Hosseini, Zhihao Hong, Reza Farivar, and Roy Campbell. 2015. R-Storm: Resource-Aware Scheduling in Storm. In Middleware. Google ScholarDigital Library
- Achille Peternier, Daniele Bonetta,Walter Binder, and Cesare Pautasso. 2011. Overseer: low-level hardware monitoring and management for Java. In PPPJ. Google ScholarDigital Library
- Danica Porobic, Ippokratis Pandis, Miguel Branco, Pinar Tözün, and Anastasia Ailamaki. 2012. OLTP on Hardware Islands. PVLDB (2012).Google Scholar
- Iraklis Psaroudakis, Tobias Scheuer, Norman May, Abdelkader Sellami, and Anastasia Ailamaki. 2015. Scaling up concurrent main-memory column-store scans: towards adaptive NUMA-aware data and task placement. In VLDB.Google Scholar
- Iraklis Psaroudakis, Tobias Scheuer, Norman May, Abdelkader Sellami, and Anastasia Ailamaki. 2016. Adaptive NUMA-aware data placement and task scheduling for analytical workloads in main-memory columnstores. Proc of the VLDB Endow. (2016). Google ScholarDigital Library
- Scott Schneider, JoelWolf, Kirsten Hildrum, Rohit Khandekar, and Kun- Lung Wu. 2016. Dynamic Load Balancing for Ordered Data-Parallel Regions in Distributed Streaming Systems. (2016).Google Scholar
- Kian-Lee Tan, Qingchao Cai, Beng Chin Ooi, Weng-Fai Wong, Chang Yao, and Hao Zhang. 2015. In-memory Databases: Challenges and Opportunities From Software and Hardware Perspectives. SIGMOD Record (2015).Google Scholar
- Stratis D. Viglas and Jeffrey F. Naughton. 2002. Rate-based Query Optimization for Streaming Information Sources. In SIGMOD. Google ScholarDigital Library
- J. Xu, Z. Chen, J. Tang, and S. Su. 2014. T-Storm: Traffic-Aware Online Scheduling in Storm. In ICDCS. Google ScholarDigital Library
- H. Zhang, G. Chen, B. C. Ooi, K. Tan, and M. Zhang. 2015. In-Memory Big Data Management and Processing: A Survey. TKDE (2015).Google Scholar
- Shuhao Zhang, Bingsheng He, Daniel Dahlmeier, Amelie Chi Zhou, and Thomas Heinze. 2017. Revisiting the Design of Data Stream Processing Systems on Multi-Core Processors. In ICDE.Google Scholar
- Jingren Zhou and Kenneth A. Ross. 2004. Buffering Databse Operations for Enhanced Instruction Cache Performance. In SIGMOD. Google ScholarDigital Library
Index Terms
- BriskStream: Scaling Data Stream Processing on Shared-Memory Multicore Architectures
Recommendations
Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of DataWith modern computer architecture evolving, two problems conspire against the state-of-the-art approaches in parallel query execution: (i) to take advantage of many-cores, all query work must be distributed evenly among (soon) hundreds of threads in ...
Mitigating the NUMA effect on task-based runtime systems
AbstractProcessors with multiple sockets or chiplets are becoming more conventional. These kinds of processors usually expose a single shared address space. However, due to hardware restrictions, they adopt a NUMA approach, where each processor accesses ...
Evaluation of Rodinia Codes on Intel Xeon Phi
ISMS '13: Proceedings of the 2013 4th International Conference on Intelligent Systems, Modelling and SimulationHigh performance computing (HPC) is a niche area where various parallel benchmarks are constantly used to explore and evaluate the performance of Heterogeneous computing systems on the horizon. The Rodinia benchmark suite, a collection of parallel ...
Comments