Abstract
Analytics provided on top of large scale data streams are the key research subject for future decision making applications. The huge volumes of data make their partitioning imperative to efficiently support novel applications. Such applications should be based on intelligent, efficient methods for querying multiple data partitions. A processor is placed in front of each partition dedicated to manage/execute queries for the specific piece of data. Continuous queries over these data sources require intelligent mechanisms to result the final outcome (query response) in the minimum time with the maximum performance. This paper proposes a mechanism for handling the behavior of an entity that undertakes the responsibility of handling the incoming queries. Our mechanism adopts a time-optimized scheme for selecting the appropriate processor(s) for each incoming query through the use of the Odds algorithm. We try to result the optimal assignment, i.e., queries to processors, in the minimum time while maximizing the performance. We provide mathematical formulations for describing the discussed problem and present simulation results and a comparative analysis. Through a large number of experiments, we reveal the advantages of the model and give numerical results comparing it with a deterministic model as well as with other efforts in the domain.
Similar content being viewed by others
References
Abadi DJ, Carney D, Cetintemel U, Cherniack M, Convey C, Lee S, Stonebraker M, Tatbul N, Zdonik SB (2003) Aurora: a new model and architecture for data stream management. VLDB J 12(2):120–139
Abouzeid A, Bajda-Pawlikowski K, Abadi DJ, Rasin A, Silberschatz A (2009) HadoopDB: an architectural hybrid of mapreduce and dbms technologies for analytical workloads. PVLDB 2(1):922–933
Agarwal S, Milner H, Kleiner A, Talwalkar A, Jordan M, Madden S, Mozafari B, Stoica I (2014) Knowing when youre wrong: building fast and reliable approximate query processing systems. ACM SIGMOD, USA
Ailamaki A, Pandis I (2009) Query processor, encyclopedia of database systems. Springer, Berlin, pp 2307–2308
Arasu A, Babcock B, Babu S, Cieslewicz J, Datar M, Ito K, Motwani R, Srivastava U, Widom J (2004) STREAM: the Stanford data stream management system. Springer, Berlin
Awais A, Paul A, Din S, Rathore MM, Choi GS, Jeon G (2017) Multilevel data processing using parallel algorithms for analyzing big data in high-performance computing. Int J Parallel Prog:1–20
Balkensen C, Tatbul N (2011) Scalable data partitioning techniques for parallel sliding window processing over data streams. In: Proceedings of 8th International Workshop on Data Management for Sensor Networks, Seattle, WA, USA
Bruss T (2000) Sum the odds to one and stop. Ann Probab 28(3)
Bruss T, Louchard G (2009) The odds-algorithm based on sequential updating and its performance. Adv Appl Probab 41(1):131–153
Cao L, Rundensteiner EA (2013) High performance stream query processing with correlation-aware partitioning. In: Proceedings of the VLDB Endowment, vol 7(4), Hangzhou, China, pp 265–276
Chandramouli B, Goldstein J, Quamar A (2013) Scalable progressive analytics on big data in the cloud. In: Proceedings of the VLDB Endowment, vol 6(14)
Chandrasekaran S, Franklin MJ (2003) PSOup: a system for streaming queries over streaming data. VLDB J 12(2):140–156
Chaudhuri S, Das G, Srivastava U (2004) Effective use of block-level sampling in statistics estimation. In: SIGMOD
Condie T, Conway N, Alvaro P, Hellerstein JM, Elmeleegy K, Sears R (2010) MapReduce online. In: Proceedings of the 7th Conference on Networked Systems Design and Implementation
Cranor C, Johnson T, Spataschek O, Shkapenyuk V (2003) Gigascope: a stream database for network applications. In: Proceedings of the ACM International Conference on Management of Data SIGMOD
Dittrich J, Quiane-Ruiz JA, Jindal A, Kargin Y, Setty V, Schad J (2010) Hadoop++: making a yellow elephant run like a cheetah. PVLDB 3(1):518–529
Doucet A, Briers M, Senecal S (2006) Efficient block sampling strategies for sequential Monte Carlo methods. J Comput Graph Stat 15(3):693–711
Erra U, Senatore S, Minnella F, Caggianese G (2015) Approximate TF-IDF based on topic extraction from massive message stream using the GPU. Inf Sci 292:143–161
Fengguang S, Dongarra J (2015) A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems. Concurrency Comput: Pract Experience 27.14:3702–3723
Ferguson TS (2014) Optimal Stopping and Applications, Mathematics Department, UCLA, Available online http://www.math.ucla.edu/tom/Stopping/Contents.html, accessed March
Gedik B (2014) Partitioning functions for stateful data parallelism in stream processing. VLDB J 23(4):517–539
Gedik B, Schneider S, Hirzel M, Wu K-L (2014) Elastic scaling for data stream processing. IEEE Trans Parallel Distrib Syst 25(6):1447–1463
Hameurlain A, Morvan F (2009) Evolution of query optimization methods, transactions on large-scale data- and knowledge-centered systems i. Springer, Berlin, pp 211–242
Hammad M, Ghanem TM, Aref W, Elmagarmid AK, Mokbel M (2003) Efficient pipelined execution of sliding-window queries over data streams, technical report TR CSD-03-035 Purdue University Department of Computer Sciences
Han J, Kamber M, Pei J (2012) Data mining, concepts and techniques, 3rd Edition. Elsevier, Amsterdam
Hellerstein JM, Avnur R (2000) Informix under control: online query processing. Data Mining and Knowledge Discovery Journal
Herodotou H, Lim H, Luo G, Borisov N, DOng L, Cetin FB, Babu S (2011) Starfish: a self-tuning system for big data analytics in CIDR
Jermaine C, Arumugam S, Pol A, Dobra A (2007) Scalable approximate query processing with the DBO engine. In: SIGMOD
Jiang D, Ooi DC, Shi L, Wu S (2010) The performance of MapReduce: an in-depth study. PVLDB 3(1):472–483
Jones M, Marron J, Sheather S (1996) A brief survey of bandwidth selection for density estimation. Am Stat Assoc 91:401–407
Kolomvatsos K, Anagnostopoulos C (2017) Reinforcement machine learning for predictive analytics in smart cities, informatics. MDPI 4:16
Kolomvatsos K, Hadjiefthymiades S (2017) Learning the engagement of query processors for intelligent analytics. Springer Appl Intell J 46(1):96–112, 1–17
Logothetis D, Yocum K (2008) Ad-hoc data processing in the cloud. Proc VLDB Endowment 1(2):1472–1475
Mokbel M, Xiong X, Hammad M, Aref W (2005) Continuous Query Processing of Spatio-Temporal Data Streams in PLACE. Geoinformatics 9(4):343–365
Motwani R, Widom J, Arasu A, Babcock B, Babu S, Datar M, Manku GS, Olston C, Rosenstein J, Varma R (2003) Query processing, approximation, and resource management in a data stream management system. In: Proceedings of the International Conference on Innovative Data Systems Research CIDR
Ozgu MT, Valduriez P Overview of Query Processing, Principles of Distributes Database Systems, 3rd Edition, 20111, pp. 205–220
Pansare N, Borkar VR, Jermaine C, Condie T (2011) Online aggregation for large MapReduce jobs. In: PVLDB
Peskir G, Shiryaev A (2006) Optimal stopping and free boundary problems. ETH Zuerich, Birkhauser
Rahman Md W, Lu X, Islam NS, Panda DK (2014) HOMR: A hybrid approach to exploit maximum overlapping in MapReduce over high performance interconnects. In: Proceedings of the 28th ACM International Conference on Supercomputing (ICS 14). ACM, New York, pp 33–42
Raman V, Raman B, Hellerstein JM (1999) Online dynamic reordering for interactive data processing. In: VLDB
Raykar C, Duraiswami R (2006) Fast optimal bandwidth selection for kernel density estimation, SIAM International Conference on Data Mining
Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, London
Singh S, Singh N (2012) Big data analytics. In: Proceedings of the International Conference on Communication, Information and Computing Technology
Wand MP, Jones M (1995) C. Kernel Smoothing, Chapman and Hall
Yao Y, Gehrke J (2002) The Cougar approach to in-network query processing in sensor networks. SIGMOD Record 31(3):9–18
Zeitler E, Risch T (2010) Scalable splitting of massive data streams. In: Kitagawa H, Ishikawa Y, Li Q, Watanabe C (eds) Database Systems for Advanced Applications, DASFAA 2010, Lecture Notes in Computer Science, vol 5982. Springer, Berlin
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kolomvatsos, K. An intelligent scheme for assigning queries. Appl Intell 48, 2730–2745 (2018). https://doi.org/10.1007/s10489-017-1099-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-017-1099-5