An intelligent scheme for assigning queries

Kolomvatsos, Kostas

doi:10.1007/s10489-017-1099-5

An intelligent scheme for assigning queries

Published: 26 December 2017

Volume 48, pages 2730–2745, (2018)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Kostas Kolomvatsos^1,2

709 Accesses
10 Citations
Explore all metrics

Abstract

Analytics provided on top of large scale data streams are the key research subject for future decision making applications. The huge volumes of data make their partitioning imperative to efficiently support novel applications. Such applications should be based on intelligent, efficient methods for querying multiple data partitions. A processor is placed in front of each partition dedicated to manage/execute queries for the specific piece of data. Continuous queries over these data sources require intelligent mechanisms to result the final outcome (query response) in the minimum time with the maximum performance. This paper proposes a mechanism for handling the behavior of an entity that undertakes the responsibility of handling the incoming queries. Our mechanism adopts a time-optimized scheme for selecting the appropriate processor(s) for each incoming query through the use of the Odds algorithm. We try to result the optimal assignment, i.e., queries to processors, in the minimum time while maximizing the performance. We provide mathematical formulations for describing the discussed problem and present simulation results and a comparative analysis. Through a large number of experiments, we reveal the advantages of the model and give numerical results comparing it with a deterministic model as well as with other efforts in the domain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

https://www.ibm.com/support/knowledgecenter/en/SSDP9S_{1}1.1.0/com.ibm.swg.im.iis.fed.classic.overview.doc/topics/iiyfcstoqp.html

References

Abadi DJ, Carney D, Cetintemel U, Cherniack M, Convey C, Lee S, Stonebraker M, Tatbul N, Zdonik SB (2003) Aurora: a new model and architecture for data stream management. VLDB J 12(2):120–139
Article Google Scholar
Abouzeid A, Bajda-Pawlikowski K, Abadi DJ, Rasin A, Silberschatz A (2009) HadoopDB: an architectural hybrid of mapreduce and dbms technologies for analytical workloads. PVLDB 2(1):922–933
Google Scholar
Agarwal S, Milner H, Kleiner A, Talwalkar A, Jordan M, Madden S, Mozafari B, Stoica I (2014) Knowing when youre wrong: building fast and reliable approximate query processing systems. ACM SIGMOD, USA
Google Scholar
Ailamaki A, Pandis I (2009) Query processor, encyclopedia of database systems. Springer, Berlin, pp 2307–2308
Google Scholar
Arasu A, Babcock B, Babu S, Cieslewicz J, Datar M, Ito K, Motwani R, Srivastava U, Widom J (2004) STREAM: the Stanford data stream management system. Springer, Berlin
Google Scholar
Awais A, Paul A, Din S, Rathore MM, Choi GS, Jeon G (2017) Multilevel data processing using parallel algorithms for analyzing big data in high-performance computing. Int J Parallel Prog:1–20
Balkensen C, Tatbul N (2011) Scalable data partitioning techniques for parallel sliding window processing over data streams. In: Proceedings of 8th International Workshop on Data Management for Sensor Networks, Seattle, WA, USA
Bruss T (2000) Sum the odds to one and stop. Ann Probab 28(3)
Bruss T, Louchard G (2009) The odds-algorithm based on sequential updating and its performance. Adv Appl Probab 41(1):131–153
Article MathSciNet MATH Google Scholar
Cao L, Rundensteiner EA (2013) High performance stream query processing with correlation-aware partitioning. In: Proceedings of the VLDB Endowment, vol 7(4), Hangzhou, China, pp 265–276
Chandramouli B, Goldstein J, Quamar A (2013) Scalable progressive analytics on big data in the cloud. In: Proceedings of the VLDB Endowment, vol 6(14)
Chandrasekaran S, Franklin MJ (2003) PSOup: a system for streaming queries over streaming data. VLDB J 12(2):140–156
Article Google Scholar
Chaudhuri S, Das G, Srivastava U (2004) Effective use of block-level sampling in statistics estimation. In: SIGMOD
Condie T, Conway N, Alvaro P, Hellerstein JM, Elmeleegy K, Sears R (2010) MapReduce online. In: Proceedings of the 7th Conference on Networked Systems Design and Implementation
Cranor C, Johnson T, Spataschek O, Shkapenyuk V (2003) Gigascope: a stream database for network applications. In: Proceedings of the ACM International Conference on Management of Data SIGMOD
Dittrich J, Quiane-Ruiz JA, Jindal A, Kargin Y, Setty V, Schad J (2010) Hadoop++: making a yellow elephant run like a cheetah. PVLDB 3(1):518–529
Google Scholar
Doucet A, Briers M, Senecal S (2006) Efficient block sampling strategies for sequential Monte Carlo methods. J Comput Graph Stat 15(3):693–711
Article MathSciNet Google Scholar
Erra U, Senatore S, Minnella F, Caggianese G (2015) Approximate TF-IDF based on topic extraction from massive message stream using the GPU. Inf Sci 292:143–161
Article Google Scholar
Fengguang S, Dongarra J (2015) A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems. Concurrency Comput: Pract Experience 27.14:3702–3723
Google Scholar
Ferguson TS (2014) Optimal Stopping and Applications, Mathematics Department, UCLA, Available online http://www.math.ucla.edu/tom/Stopping/Contents.html, accessed March
Gedik B (2014) Partitioning functions for stateful data parallelism in stream processing. VLDB J 23(4):517–539
Article Google Scholar
Gedik B, Schneider S, Hirzel M, Wu K-L (2014) Elastic scaling for data stream processing. IEEE Trans Parallel Distrib Syst 25(6):1447–1463
Article Google Scholar
Hameurlain A, Morvan F (2009) Evolution of query optimization methods, transactions on large-scale data- and knowledge-centered systems i. Springer, Berlin, pp 211–242
Google Scholar
Hammad M, Ghanem TM, Aref W, Elmagarmid AK, Mokbel M (2003) Efficient pipelined execution of sliding-window queries over data streams, technical report TR CSD-03-035 Purdue University Department of Computer Sciences
Han J, Kamber M, Pei J (2012) Data mining, concepts and techniques, 3rd Edition. Elsevier, Amsterdam
MATH Google Scholar
Hellerstein JM, Avnur R (2000) Informix under control: online query processing. Data Mining and Knowledge Discovery Journal
Herodotou H, Lim H, Luo G, Borisov N, DOng L, Cetin FB, Babu S (2011) Starfish: a self-tuning system for big data analytics in CIDR
Jermaine C, Arumugam S, Pol A, Dobra A (2007) Scalable approximate query processing with the DBO engine. In: SIGMOD
Jiang D, Ooi DC, Shi L, Wu S (2010) The performance of MapReduce: an in-depth study. PVLDB 3(1):472–483
Google Scholar
Jones M, Marron J, Sheather S (1996) A brief survey of bandwidth selection for density estimation. Am Stat Assoc 91:401–407
Article MathSciNet MATH Google Scholar
Kolomvatsos K, Anagnostopoulos C (2017) Reinforcement machine learning for predictive analytics in smart cities, informatics. MDPI 4:16
Google Scholar
Kolomvatsos K, Hadjiefthymiades S (2017) Learning the engagement of query processors for intelligent analytics. Springer Appl Intell J 46(1):96–112, 1–17
Article Google Scholar
Logothetis D, Yocum K (2008) Ad-hoc data processing in the cloud. Proc VLDB Endowment 1(2):1472–1475
Article Google Scholar
Mokbel M, Xiong X, Hammad M, Aref W (2005) Continuous Query Processing of Spatio-Temporal Data Streams in PLACE. Geoinformatics 9(4):343–365
Article Google Scholar
Motwani R, Widom J, Arasu A, Babcock B, Babu S, Datar M, Manku GS, Olston C, Rosenstein J, Varma R (2003) Query processing, approximation, and resource management in a data stream management system. In: Proceedings of the International Conference on Innovative Data Systems Research CIDR
Ozgu MT, Valduriez P Overview of Query Processing, Principles of Distributes Database Systems, 3rd Edition, 20111, pp. 205–220
Pansare N, Borkar VR, Jermaine C, Condie T (2011) Online aggregation for large MapReduce jobs. In: PVLDB
Peskir G, Shiryaev A (2006) Optimal stopping and free boundary problems. ETH Zuerich, Birkhauser
MATH Google Scholar
Rahman Md W, Lu X, Islam NS, Panda DK (2014) HOMR: A hybrid approach to exploit maximum overlapping in MapReduce over high performance interconnects. In: Proceedings of the 28th ACM International Conference on Supercomputing (ICS 14). ACM, New York, pp 33–42
Raman V, Raman B, Hellerstein JM (1999) Online dynamic reordering for interactive data processing. In: VLDB
Raykar C, Duraiswami R (2006) Fast optimal bandwidth selection for kernel density estimation, SIAM International Conference on Data Mining
Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, London
Book MATH Google Scholar
Singh S, Singh N (2012) Big data analytics. In: Proceedings of the International Conference on Communication, Information and Computing Technology
Wand MP, Jones M (1995) C. Kernel Smoothing, Chapman and Hall
Google Scholar
Yao Y, Gehrke J (2002) The Cougar approach to in-network query processing in sensor networks. SIGMOD Record 31(3):9–18
Article Google Scholar
Zeitler E, Risch T (2010) Scalable splitting of massive data streams. In: Kitagawa H, Ishikawa Y, Li Q, Watanabe C (eds) Database Systems for Advanced Applications, DASFAA 2010, Lecture Notes in Computer Science, vol 5982. Springer, Berlin

Download references

Author information

Authors and Affiliations

Department of Informatics and Telecommunications, University of Athens, Ilisia, 15784, Greece
Kostas Kolomvatsos
Department of Computer Science, University of Thessaly, Lamia, 35100, Greece
Kostas Kolomvatsos

Authors

Kostas Kolomvatsos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kostas Kolomvatsos.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kolomvatsos, K. An intelligent scheme for assigning queries. Appl Intell 48, 2730–2745 (2018). https://doi.org/10.1007/s10489-017-1099-5

Download citation

Published: 26 December 2017
Issue Date: September 2018
DOI: https://doi.org/10.1007/s10489-017-1099-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An intelligent scheme for assigning queries

Abstract

Access this article

Similar content being viewed by others

Stratified random sampling from streaming and stored data

Big data analytics: a survey

Big data analytics on Apache Spark

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An intelligent scheme for assigning queries

Abstract

Access this article

Similar content being viewed by others

Stratified random sampling from streaming and stored data

Big data analytics: a survey

Big data analytics on Apache Spark

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation