ABSTRACT
We present the scalable, elastic operator scheduler in IBM Streams 4.2. Streams is a distributed stream processing system used in production at many companies in a wide range of industries. The programming language for Streams, SPL, presents operators, tuples and streams as the primary abstractions. A fundamental SPL optimization is operator fusion, where multiple operators execute in the same process. Streams 4.2 introduces automatic submission-time fusion to simplify application development and deployment. However, potentially thousands of operators could then execute in the same process, with no user guidance for thread placement. We needed a way to automatically figure out how many threads to use, with arbitrarily sized applications on a wide variety of hardware, and without any input from programmers. Our solution has two components. The first is a scalable operator scheduler that minimizes synchronization, locks and global data, while allowing threads to execute any operator and dynamically come and go. The second is an elastic algorithm to dynamically adjust the number of threads to optimize performance, using the principles of trusted measurements to establish trends. We demonstrate our scheduler's ability to scale to over a hundred threads, and our elasticity algorithm's ability to adapt to different workloads on an Intel Xeon system with 176 logical cores, and an IBM Power8 system with 184 logical cores.
- Streaming Analytics. https://console.ng.bluemix.net/catalog/ services/streaming-analytics. Retrieved April, 2017.Google Scholar
- Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. Cilk: An efficient multithreaded runtime system. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP ’95, New York, NY, USA, 1995. ACM. Google ScholarDigital Library
- Boost.Lockfree. http://www.boost.org/doc/libs/1_63_0/doc/ html/lockfree.html. Retrieved March, 2017.Google Scholar
- C++ std::atomic. http://en.cppreference.com/w/cpp/atomic/ atomic. Retrieved March, 2017.Google Scholar
- Apache Flink. http://flink.apache.org. Retrieved March, 2017.Google Scholar
- Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The Implementation of the Cilk-5 Multithreaded Language. In Programming Language Design and Implementation (PLDI), 1998.Google ScholarDigital Library
- Bugra Gedik, Scott Schneider, Martin Hirzel, and Kun-Lung Wu. Elastic scaling for data stream processing. IEEE Transactions on Parallel and Distributed Systems (TPDS), 2014. Google ScholarDigital Library
- IBM Streams Demo. https://github.com/IBMStreams/streamsx. demo.logwatch. Retrieved March, 2017.Google Scholar
- IBM Streams Samples. https://github.com/IBMStreams/samples. Retrieved March, 2017.Google Scholar
- Heron. https://twitter.github.io/heron. Retrieved March, 2017.Google Scholar
- Martin Hirzel, Henrique Andrade, Bu˘gra Gedik, Gabriela Jacques-Silva, Rohit Khandekar, Vibhore Kumar, Mark Mendell, Howard Nasgaard, Scott Schneider, Robert Soulé, and Kun-Lung Wu. IBM Streams Processing Language: Analyzing big data in motion. IBM Journal of Research and Development, 57(3/4), 2013. Google ScholarDigital Library
- Martin Hirzel, Scott Schneider, and Bu˘gra Gedik. SPL: An extensible language for distributed stream processing. ACM Transactions on Programming Languages and Systems (TOPLAS), 39(1), March 2017. Google ScholarDigital Library
- IBM Stream Computing. http://www.ibm.com/analytics/us/en/ technology/stream-computing. Retrieved March, 2017.Google Scholar
- Gabriela Jacques-Silva, Fang Zheng, Daniel Debrunner, Kun-Lung Wu, Victor Dogaru, Eric Johnson, Michael Spicer, and Ahmet Erdem Sariyuce. Consistent Regions: Guaranteed Tuple Processing in IBM Streams. In Very Large Data Bases Conference (VLDB), 2016.Google Scholar
- Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. Naiad: A timely dataflow system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP ’13, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- OpenMP. http://openmp.org/. Retrieved March, 2017.Google Scholar
- Semih Sahin. C-stream: A coroutine-based elastic stream processing engine. Master’s thesis, Bilkent University, June 2015.Google Scholar
- Scott Schneider. The ElasticLoadBalance Operator. https://developer.ibm.com/streamsdev/2015/01/27/ elasticloadbalance-operator, 2015.Google Scholar
- Scott Schneider, Henrique Andrade, Bugra Gedik, Alain Biem, and Kun-Lung Wu. Elastic scaling of data parallel operators in stream processing. In IEEE International Parallel and Distributed Processing Symposium, 2009. Google ScholarDigital Library
- Scott Schneider, Bugra Gedik, and Martin Hirzel. Language runtime and optimizations in IBM Streams. IEEE Database Engineering Bulletin, 38(4), 2015.Google Scholar
- SPL Reference. http://www.ibm.com/support/knowledgecenter/ SSCRJU_4.2.0/com.ibm.streams.ref.doc/doc/spl-container. html. Retrieved March, 2017.Google Scholar
- Apache Storm. http://storm.apache.org. Retrieved March, 2017.Google Scholar
- StreamsDev: IBM Streams Developer Community. https:// developer.ibm.com/streamsdev. Retrieved March, 2017.Google Scholar
- Yuzhe Tang and Bugra Gedik. Auto-pipelining for data stream processing. IEEE Transactions on Parallel and Distributed Systems (TPDS), 24(11), 2013. Google ScholarDigital Library
- Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, and Dmitriy Ryaboy. Storm@twitter. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD ’14, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
Index Terms
- Low-synchronization, mostly lock-free, elastic scheduling for streaming runtimes
Recommendations
Low-synchronization, mostly lock-free, elastic scheduling for streaming runtimes
PLDI '17We present the scalable, elastic operator scheduler in IBM Streams 4.2. Streams is a distributed stream processing system used in production at many companies in a wide range of industries. The programming language for Streams, SPL, presents operators, ...
Lock-based synchronization for GPU architectures
CF '16: Proceedings of the ACM International Conference on Computing FrontiersModern GPUs have shown promising results in accelerating compute-intensive and numerical workloads with limited data sharing. However, emerging GPU applications manifest ample amount of data sharing among concurrently executing threads. Often data ...
Performance Evaluation of Concurrent Lock-free Data Structures on GPUs
ICPADS '12: Proceedings of the 2012 IEEE 18th International Conference on Parallel and Distributed SystemsGraphics processing units (GPUs) have emerged as a strong candidate for high-performance computing. While regular data-parallel computations with little or no synchronization are easy to map on the GPU architectures, it is a challenge to scale up ...
Comments