research-article

BriskStream: Scaling Data Stream Processing on Shared-Memory Multicore Architectures

Authors:
Shuhao Zhang

National University of Singapore, Singapore, Singapore

National University of Singapore, Singapore, Singapore
View Profile

,
Jiong He

Advanced Digital Sciences Center, Singapore, Singapore

Advanced Digital Sciences Center, Singapore, Singapore
View Profile

,
Amelie Chi Zhou

Shenzhen University, Shenzhen, China

Shenzhen University, Shenzhen, China
View Profile

,
Bingsheng He

National University of Singapore, Singapore, Singapore

National University of Singapore, Singapore, Singapore
View Profile

SIGMOD '19: Proceedings of the 2019 International Conference on Management of DataJune 2019Pages 705–722https://doi.org/10.1145/3299869.3300067

Published:25 June 2019Publication History

SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data

Pages 705–722

ABSTRACT

We introduce BriskStream, an in-memory data stream processing system (DSPSs) specifically designed for modern shared-memory multicore architectures. BriskStream's key contribution is an execution plan optimization paradigm, namely RLAS, which takes relative-location (i.e., NUMA distance) of each pair of producer-consumer operators into consideration. We propose a branch and bound based approach with three heuristics to resolve the resulting nontrivial optimization problem. The experimental evaluations demonstrate that BriskStream yields much higher throughput and better scalability than existing DSPSs on multi-core architectures when processing different types of workloads.

References

2008. Classmexer. https://www.javamex.com/classmexer/Google Scholar
2015. SGI UV 300 UV300EX Data Sheet, http://www.newroute.com/ upload/updocumentos/a06ee4637786915bc954e850a6b5580f.pdf.Google Scholar
2017. NUMA patch for Flink, https://issues.apache.org/jira/browse/ FLINK-3163.Google Scholar
2018. Apache Flink. https://flink.apache.org/Google Scholar
2018. Apache Storm. http://storm.apache.org/Google Scholar
2018. HP ProLiant DL980 G7 server with HP PREMA Architecture Technical Overview. https://community.hpe.com/hpeb/attachments/ hpeb/itrc-264/106801/1/363896.pdfGoogle Scholar
2018. Intel Memory Latency Checker, https://software.intel.com/ articles/intelr-memory-latency-checker.Google Scholar
2018. Intel VTune Amplifier. https://software.intel.com/en-us/ intel-vtune-amplifier-xe.Google Scholar
2018. OpenHFT. https://github.com/OpenHFT/Java-Thread-AffinityGoogle Scholar
Daniel J. Abadi, Yanif Ahmad, Magdalena Balazinska, Ugur Cetintemel, Mitch Cherniack, Jeong H. Hwang, Wolfgang Lindner, Anurag S. Maskey, Alexander Rasin, Esther Ryvkina, Nesime Tatbul, Ying Xing, and Stan Zdonik. 2005. The Design of the Borealis Stream Processing Engine. In CIDR.Google Scholar
Anastassia Ailamaki, David J. DeWitt, Mark D. Hill, and David A. Wood. 2009. DBMSs on a Modern Processor: Where Does Time Go?. In VLDB. Google ScholarDigital Library
M. Akdere, U. Çetintemel, M. Riondato, E. Upfal, and S. B. Zdonik. 2012. Learning-based Query Performance Modeling and Prediction. In ICDE. Google ScholarDigital Library
Leonardo Aniello, Roberto Baldoni, and Leonardo Querzoni. 2013. Adaptive Online Scheduling in Storm. In DEBS. Google ScholarDigital Library
Raja Appuswamy, Christos Gkantsidis, Dushyanth Narayanan, Orion Hodson, and Antony Rowstron. 2013. Scale-up vs scale-out for hadoop: Time to rethink?. In SoCC. Google ScholarDigital Library
C. Balkesen, J. Teubner, G. Alonso, and M. T. Özsu. 2013. Main-Memory Hash Joins on Multi-Core CPUs: Tuning to the Underlying Hardware. In ICDE. Google ScholarDigital Library
Peter A. Boncz, Stefan Manegold, and Martin L. Kersten. 1999. Database Architecture Optimized for the new. Bottleneck: Memory Access. In VLDB.Google ScholarDigital Library
Surendra Byna, Xian He Sun, William Gropp, and Rajeev Thakur. 2004. Predicting memory-access cost based on data-access patterns. In ICCC. Google ScholarDigital Library
Valeria Cardellini, Vincenzo Grassi, Francesco Lo Presti, and Matteo Nardelli. 2016. Optimal Operator Placement for Distributed Stream Processing Applications. In DEBS. Google ScholarDigital Library
Valeria Cardellini, Vincenzo Grassi, Francesco Lo Presti, and Matteo Nardelli. 2017. Optimal Operator Replication and Placement for Distributed Stream Processing Systems. In SIGMETRICS Perform. Eval. Rev. Google ScholarDigital Library
V. Cardellini, M. Nardelli, and D. Luzi. 2016. Elastic Stateful Stream Processing in Storm. In HPCS.Google Scholar
Sirish Chandrasekaran and et al. 2003. TelegraphCQ: Continuous dataflow processing for an uncertain world. In CIDR.Google Scholar
Xuntao Cheng, Bingsheng He, Xiaoli Du, and Chiew Tong Lau. 2017. A Study of Main-Memory Hash Joins on Many-core Processor: A Case with Intel Knights Landing Architecture. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM '17). Google ScholarDigital Library
Alvin Cheung, Owen Arden, Samuel Madden, and Andrew C. Myers. 2013. Speeding up database applications with Pyxis. In SIGMOD. Google ScholarDigital Library
Tathagata Das, Yuan Zhong, Ion Stoica, and Scott Shenker. 2014. Adaptive Stream Processing usingDynamic Batch Sizing. SOCC (2014). Google ScholarDigital Library
Raul Castro Fernandez, Matteo Migliavacca, Evangelia Kalyvianaki, and Peter R. Pietzuch. 2013. Integrating scale out and fault tolerance in stream processing using operator state management. In SIGMOD. Google ScholarDigital Library
B. Gedik, S. Schneider, M. Hirzel, and K. L. Wu. 2014. Elastic Scaling for Data Stream Processing. TPDS (2014). Google ScholarDigital Library
Jana Giceva, Gustavo Alonso, Timothy Roscoe, and Tim Harris. 2014. Deployment of Query Plans on Multicores. In VLDB. Google ScholarDigital Library
Stavros Harizopoulos and Anastassia Ailamaki. 2006. Improving Instruction Cache Performance in OLTP. TODS (2006). Google ScholarDigital Library
Bingsheng He, Qiong Luo, and B. Choi. 2005. Cache-conscious automata for XML filtering. In ICDE. Google ScholarDigital Library
Martin Hirzel, Robert Soulé, Scott Schneider, Bura Gedik, and Robert Grimm. 2014. A Catalog of Stream Processing Optimizations. ACM Comput. Surv. (2014).Google Scholar
C. Iancu, S. Hofmeyr, F. Blagojevic, and Y. Zheng. 2010. In IPDPS.Google Scholar
Navendu Jain, Lisa Amini, Henrique Andrade, Richard King, Yoonho Park, Philippe Selo, and Chitra Venkatramani. 2006. Design, Implementation, and Evaluation of the Linear Road Bnchmark on the Stream Processing Core. In SIGMOD. Google ScholarDigital Library
Saurabh Jha, Bingsheng He, Mian Lu, Xuntao Cheng, and Huynh Phung Huynh. 2015. Improving Main Memory Hash Joins on Intel Xeon Phi Processors: An Experimental Approach. Proc. VLDB Endow. (2015). Google ScholarDigital Library
Khandekar, Rohit and Hildrum, Kirsten and Parekh, Sujay and Rajan, Deepak andWolf, Joel andWu, Kun-Lung and Andrade, Henrique and Gedik, Bufra. 2009. COLA: Optimizing Stream Processing Applications via Graph Partitioning. In Middleware. Google ScholarDigital Library
Alexandros Koliousis, Matthias Weidlich, Raul Castro Fernandez, Alexander L. Wolf, Paolo Costa, and Peter R. Pietzuch. 2016. SABER: Window-Based Hybrid Stream Processing for Heterogeneous Architectures. In SIGMOD. Google ScholarDigital Library
Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel, Karthikeyan Ramasamy, and Siddarth Taneja. 2015. Twitter Heron: Stream Processing at Scale. In SIGMOD. Google ScholarDigital Library
Geetika T. Lakshmanan, Ying Li, and Robert E. Strom. 2008. Placement Strategies for Internet-Scale Data Stream Systems. IEEE Internet Computing (2008). Google ScholarDigital Library
Jaekyu Lee, Hyesoon Kim, and Richard Vuduc. 2012. When Prefetching Works, When It Doesn&Rsquo;T, and Why. ACM Trans. Archit. Code Optim. (2012). Google ScholarDigital Library
Viktor Leis, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2014. Morsel-driven Parallelism: A NUMA-aware Query Evaluation Framework for the Many-core Age. In SIGMOD. Google ScholarDigital Library
Yinan Li, Ippokratis Pandis, René Müller, Vijayshankar Raman, and Guy M. Lohman. 2013. NUMA-aware algorithms: the case of data shuffling. In CIDR.Google Scholar
Devroye Luc and Zamora Cura Carlos. 1999. On the Complexity of Branch-and Bound Search for Random Trees. Random Struct. Algorithms (1999). Google ScholarDigital Library
Hongyu Miao, Heejin Park, Myeongjae Jeon, Gennady Pekhimenko, Kathryn S. McKinley, and Felix Xiaozhu Lin. 2017. StreamBox: Modern Stream Processing on a Multicore Machine. In ATC. Google ScholarDigital Library
David R. Morrison, Sheldon H. Jacobson, Jason J. Sauppe, and Edward C. Sewell. 2016. Branch-and-bound algorithms: A survey of recent advances in searching, branching, and pruning. Discrete Optimization (2016). Google ScholarDigital Library
Boyang Peng, Mohammad Hosseini, Zhihao Hong, Reza Farivar, and Roy Campbell. 2015. R-Storm: Resource-Aware Scheduling in Storm. In Middleware. Google ScholarDigital Library
Achille Peternier, Daniele Bonetta,Walter Binder, and Cesare Pautasso. 2011. Overseer: low-level hardware monitoring and management for Java. In PPPJ. Google ScholarDigital Library
Danica Porobic, Ippokratis Pandis, Miguel Branco, Pinar Tözün, and Anastasia Ailamaki. 2012. OLTP on Hardware Islands. PVLDB (2012).Google Scholar
Iraklis Psaroudakis, Tobias Scheuer, Norman May, Abdelkader Sellami, and Anastasia Ailamaki. 2015. Scaling up concurrent main-memory column-store scans: towards adaptive NUMA-aware data and task placement. In VLDB.Google Scholar
Iraklis Psaroudakis, Tobias Scheuer, Norman May, Abdelkader Sellami, and Anastasia Ailamaki. 2016. Adaptive NUMA-aware data placement and task scheduling for analytical workloads in main-memory columnstores. Proc of the VLDB Endow. (2016). Google ScholarDigital Library
Scott Schneider, JoelWolf, Kirsten Hildrum, Rohit Khandekar, and Kun- Lung Wu. 2016. Dynamic Load Balancing for Ordered Data-Parallel Regions in Distributed Streaming Systems. (2016).Google Scholar
Kian-Lee Tan, Qingchao Cai, Beng Chin Ooi, Weng-Fai Wong, Chang Yao, and Hao Zhang. 2015. In-memory Databases: Challenges and Opportunities From Software and Hardware Perspectives. SIGMOD Record (2015).Google Scholar
Stratis D. Viglas and Jeffrey F. Naughton. 2002. Rate-based Query Optimization for Streaming Information Sources. In SIGMOD. Google ScholarDigital Library
J. Xu, Z. Chen, J. Tang, and S. Su. 2014. T-Storm: Traffic-Aware Online Scheduling in Storm. In ICDCS. Google ScholarDigital Library
H. Zhang, G. Chen, B. C. Ooi, K. Tan, and M. Zhang. 2015. In-Memory Big Data Management and Processing: A Survey. TKDE (2015).Google Scholar
Shuhao Zhang, Bingsheng He, Daniel Dahlmeier, Amelie Chi Zhou, and Thomas Heinze. 2017. Revisiting the Design of Data Stream Processing Systems on Multi-Core Processors. In ICDE.Google Scholar
Jingren Zhou and Kenneth A. Ross. 2004. Buffering Databse Operations for Enhanced Instruction Cache Performance. In SIGMOD. Google ScholarDigital Library

Index Terms

BriskStream: Scaling Data Stream Processing on Shared-Memory Multicore Architectures
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multicore architectures
2. Information systems
  1. Data management systems
    1. Database management system engines
      1. Stream management

Recommendations

Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

With modern computer architecture evolving, two problems conspire against the state-of-the-art approaches in parallel query execution: (i) to take advantage of many-cores, all query work must be distributed evenly among (soon) hundreds of threads in ...
Read More
Mitigating the NUMA effect on task-based runtime systems
Abstract
Processors with multiple sockets or chiplets are becoming more conventional. These kinds of processors usually expose a single shared address space. However, due to hardware restrictions, they adopt a NUMA approach, where each processor accesses ...
Read More
Evaluation of Rodinia Codes on Intel Xeon Phi
ISMS '13: Proceedings of the 2013 4th International Conference on Intelligent Systems, Modelling and Simulation

High performance computing (HPC) is a niche area where various parallel benchmarks are constantly used to explore and evaluate the performance of Heterogeneous computing systems on the horizon. The Rodinia benchmark suite, a collection of parallel ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data
June 2019
2106 pages
ISBN:9781450356435
DOI:10.1145/3299869
General Chairs:
Peter Boncz
CWI & Vrije Universiteit Amsterdam, The Netherlands
,
Stefan Manegold
CWI & Universiteit Leiden, The Netherlands
,
Program Chairs:
Anastasia Ailamaki
EPFL, Switzerland
,
Amol Deshpande
University of Maryland, USA
,
Tim Kraska
MIT, USA
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 June 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
numa-awareness
operator replication and placement
Qualifiers
- research-article
Conference

Acceptance Rates
SIGMOD '19 Paper Acceptance Rate88of430submissions,20%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 30
  Total Citations
  View Citations
- 599
  Total Downloads
- Downloads (Last 12 months)65
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

BriskStream: Scaling Data Stream Processing on Shared-Memory Multicore Architectures

SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age

Mitigating the NUMA effect on task-based runtime systems

Evaluation of Rodinia Codes on Intel Xeon Phi