ABSTRACT
Distributed stream processing engines, such as Storm and Samza, have been developed to process large scale stream data. The engines are scale out horizontally with shared nothing architecture, but they do not provide high-level query language like SQL. Supporting query language for flexible analysis has become an important issue. In this paper, we provide efficient continuous relational query processing on distributed stream processing engine. We propose a methodology to transform queries executable in the engine and optimization technique for query processing. Our experimental results show that our methodology is efficient on processing queries for data streams.
- Amazon Kinesis Analytics. https://aws.amazon.com/kinesis/analytics/.Google Scholar
- Apache Kafka. http://kafka.apache.org.Google Scholar
- Apache samza. http://samza.apache.org.Google Scholar
- Apache storm: Trident. http://storm.apache.org/releases/1.0.2/Trident-API-Overview.html.Google Scholar
- Oracle CEP. https://docs.oracle.com/cd/E16764_01/doc.1111/e12048/intro.htm.Google Scholar
- TPC-H. http://www.tpc.org/tpch/.Google Scholar
- D. J. . Abadi, Y. Ahmad, M. Balazinska, U. Çetintemel, M. Cherniack, J.-H. Hwang, W. Lindner, A. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. B. . Zdonik. The Design of the Borealis Stream Processing Engine. In Proceedings of Conference on Innovative Data Systems Research (CIDR), pages 277--289, 2005.Google Scholar
- D. J. Abadi, D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik. Aurora: A new model and architecture for data stream management. The VLDB Journal, 12(2):120--139, 2003. Google ScholarDigital Library
- A. Arasu, S. Babu, and J. Widom. The cql continuous query language: Semantic foundations and query execution. The VLDB Journal, 15(2):121--142, June 2006. Google ScholarDigital Library
- S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, and M. A. Shah. Telegraphcq: Continuous dataflow processing for an uncertain world. In Proceedings of Conference on Innovative Data Systems Research (CIDR), 2003.Google Scholar
- C. Cranor, T. Johnson, O. Spataschek, and V. Shkapenyuk. Gigascope: A stream database for network applications. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pages 647--651. ACM, 2003. Google ScholarDigital Library
- B. Gedik, H. Andrade, K.-L. Wu, P. S. Yu, and M. Doo. Spade: The system s declarative stream processing engine. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pages 1123--1134. ACM, 2008. Google ScholarDigital Library
- T. M. Ghanem, A. K. Elmagarmid, P.-A. Larson, and W. G. Aref. Supporting views in data stream management systems. ACM Trans. Database Syst., 35(1):1:1--1:47, 2008. Google ScholarDigital Library
- S. Krishnamurthy, M. J. Franklin, J. Davis, D. Farina, P. Golovko, A. Li, and N. Thombre. Continuous analytics over discontinuous streams. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD '10, pages 1081--1092. ACM, 2010. Google ScholarDigital Library
- S. Kulkarni, N. Bhagat, M. Fu, V. Kedigehalli, C. Kellogg, S. Mittal, J. M. Patel, K. Ramasamy, and S. Taneja. Twitter heron: Stream processing at scale. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15, pages 239--250. ACM, 2015. Google ScholarDigital Library
- Y.-N. Law, H. Wang, and C. Zaniolo. Query languages and data models for database sequences and data streams. In Proceedings of the Thirtieth International Conference on Very Large Data Bases - Volume 30, pages 492--503. VLDB Endowment, 2004. Google ScholarDigital Library
- L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed stream computing platform. In Proceedings of the 2010 IEEE International Conference on Data Mining Workshops, ICDMW '10, pages 170--177. IEEE Computer Society, 2010. Google ScholarDigital Library
- A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. Ryaboy. Storm@twitter. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD '14, pages 147--156. ACM, 2014. Google ScholarDigital Library
Index Terms
- Efficient query processing on distributed stream processing engine
Recommendations
Distributed stream join query processing with semijoins
This paper addresses the distributed stream processing of window-based multi-way join queries considering the semijoin as a key join operator. In distributed stream processing, data streams arriving at remote sites need to be shipped to the processing ...
An Optimal Algorithm for Processing Distributed Star Queries
The problem of optimal query processing in distributed database systems was shown to be NP-hard. However, for a special type of queries called star queries, we have developed a polynomial optimal algorithm. Semijoin tactics are applied for query ...
Query processing of multi-way stream window joins
This paper introduces a class of join algorithms, termed W-join, for joining multiple infinite data streams. W-join addresses the infinite nature of the data streams by joining stream data items that lie within a sliding window and that match a certain ...
Comments