Skip to main content
Top
Published in: Distributed and Parallel Databases 4/2021

05-01-2021

Prefetched wald adaptive boost classification based Czekanowski similarity MapReduce for user query processing with bigdata

Authors: S. Tamil Selvan, P. Balamurugan, M. Vijayakumar

Published in: Distributed and Parallel Databases | Issue 4/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

With large volumes of data being generated in recent years and the inception of big data analytics on social media necessitates accurate user query processing with minimum time complexity. Several research works have been conducted in this area, to address accuracy and time complexity involved in query processing, in this work, Wald Adaptive Prefetched Boosting Classification based Czekanowski Similarity MapReduce (WAPBC–CSMR) technique is introduced. The WAPBC–CSMR technique uses the big dataset for processing large number of user queries. First, a technique called, Wald Adaptive Prefetched Boosting is employed with the objective of classifying the big dataset into different classes. To reduce the time involved in classification, in this paper a classifier called Gaussian distributive Rocchio is used that achieves significant classification in minimum time. With the classified results, a Likelihood Radio Test is applied to integrate the weak learner results into strong classification results. Then the classified and refined data are stored on the prefetcher cache. Upon reception of multi-dimensional user queries by the prefetch manager, the queries are now split into multiple keywords and are fed into the map phase, where mapping function is performed using Czekanowski Similarity Index with the objective of identifying the repeated jobs with maximum query processing accuracy. Followed by which the relevant data are retrieved from the prefetcher cache and repeated user query task is removed in the reduce phase via statistical function, therefore contributing to minimum time. Result analysis of WAPBC–CSMR is performed with big dataset using different metrics such as query processing accuracy, error rate and processing time for varied number of user queries. The result shows that WAPBC–CSMR technique enhances query processing accuracy and lessens the time as well as the error rate than the conventional methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Fathimabi, S., Subramanyam, R.B.V., Somayajulu, D.V.L.N.: MSP: multiple sub-graph query processing using structure-based graph partitioning strategy and map-reduce. J. King Saud Univ.-Comput. Inf. Sci. 31, 22–34 (2019) Fathimabi, S., Subramanyam, R.B.V., Somayajulu, D.V.L.N.: MSP: multiple sub-graph query processing using structure-based graph partitioning strategy and map-reduce. J. King Saud Univ.-Comput. Inf. Sci. 31, 22–34 (2019)
2.
go back to reference Shi, M., Shen, D., Nie, T., Kou, Y., Yu, G.: HPPQ: a parallel package queries processing approach for large-scale data. Big Data Min. Anal. 1(2), 146–159 (2018)CrossRef Shi, M., Shen, D., Nie, T., Kou, Y., Yu, G.: HPPQ: a parallel package queries processing approach for large-scale data. Big Data Min. Anal. 1(2), 146–159 (2018)CrossRef
3.
go back to reference Smys, S., Joe, C.V.: Big data business analytics as a strategic asset for health care industry. J. ISMAC 1(2), 92–100 (2019) Smys, S., Joe, C.V.: Big data business analytics as a strategic asset for health care industry. J. ISMAC 1(2), 92–100 (2019)
4.
go back to reference Lee, K., Liu, L., Ganti, R.K., Srivatsa, M., Zhang, Q., Zho, Y.: Lightweight indexing and querying services for big spatial data. IEEE Trans. Serv. Comput. 12(3), 343–355 (2019)CrossRef Lee, K., Liu, L., Ganti, R.K., Srivatsa, M., Zhang, Q., Zho, Y.: Lightweight indexing and querying services for big spatial data. IEEE Trans. Serv. Comput. 12(3), 343–355 (2019)CrossRef
5.
go back to reference Wang, H., Qin, X., Zhou, X., Li, F., Qin, Z., Zhu, Q., Wang, S.: Efficient query processing framework for a big data warehouse: an almost join-free approach. Front. Comput. Sci. 9(2), 224–236 (2015)MathSciNetCrossRef Wang, H., Qin, X., Zhou, X., Li, F., Qin, Z., Zhu, Q., Wang, S.: Efficient query processing framework for a big data warehouse: an almost join-free approach. Front. Comput. Sci. 9(2), 224–236 (2015)MathSciNetCrossRef
6.
go back to reference Karthiban, M.K., Raj, J.S.: Big data analytics for developing secure internet of everything. J. ISMAC 1(02), 129–136 (2019) Karthiban, M.K., Raj, J.S.: Big data analytics for developing secure internet of everything. J. ISMAC 1(02), 129–136 (2019)
7.
go back to reference Tang, Y., Wang, H.S.Q., Liu, X.: Handling multi-dimensional complex queries in key-value data stores. Inf. Syst. 66, 82–96 (2017)CrossRef Tang, Y., Wang, H.S.Q., Liu, X.: Handling multi-dimensional complex queries in key-value data stores. Inf. Syst. 66, 82–96 (2017)CrossRef
8.
go back to reference Birjali, M., Beni-Hssane, A., Erritali, M.: Evaluation of high-level query languages based on MapReduce in Big Data. J. Big Data 5, 1–21 (2018)CrossRef Birjali, M., Beni-Hssane, A., Erritali, M.: Evaluation of high-level query languages based on MapReduce in Big Data. J. Big Data 5, 1–21 (2018)CrossRef
9.
go back to reference Xiao, G., Li, K., Zhou, X., Li, K.: Efficient monochromatic and bichromatic probabilistic reverse top-k query processing for uncertain big data. J. Comput. Syst. Sci. 89, 92–113 (2017)MathSciNetCrossRef Xiao, G., Li, K., Zhou, X., Li, K.: Efficient monochromatic and bichromatic probabilistic reverse top-k query processing for uncertain big data. J. Comput. Syst. Sci. 89, 92–113 (2017)MathSciNetCrossRef
10.
go back to reference Smys, S.: Energy-aware security routing protocol for WSN in big-data applications. J. ISMAC 1(01), 38–55 (2019) Smys, S.: Energy-aware security routing protocol for WSN in big-data applications. J. ISMAC 1(01), 38–55 (2019)
11.
go back to reference Kim, M., Liu, L., Choi, W.: A GPU-aware parallel index for processing high-dimensional big data. IEEE Trans. Comput. 67(10), 1388–1402 (2018)MathSciNetCrossRef Kim, M., Liu, L., Choi, W.: A GPU-aware parallel index for processing high-dimensional big data. IEEE Trans. Comput. 67(10), 1388–1402 (2018)MathSciNetCrossRef
12.
go back to reference Fan, H., Ma, Z., Wang, D., Liu, J.: Handling distributed XML queries over large XML data based on MapReduce framework. Inf. Sci. 453, 1–20 (2018)MathSciNetCrossRef Fan, H., Ma, Z., Wang, D., Liu, J.: Handling distributed XML queries over large XML data based on MapReduce framework. Inf. Sci. 453, 1–20 (2018)MathSciNetCrossRef
13.
go back to reference Franciscus, N., Ren, X., Stantic, B.: Precomputing architecture for flexible and efficient big data analytics. Vietnam J. Comput. Sci. 5(2), 133–142 (2018)CrossRef Franciscus, N., Ren, X., Stantic, B.: Precomputing architecture for flexible and efficient big data analytics. Vietnam J. Comput. Sci. 5(2), 133–142 (2018)CrossRef
14.
go back to reference García-García, F., Corral, A., Iribarne, L., Vassilakopoulos, M.: Improving distance-join query processing with Voronoi-Diagram based partitioning in SpatialHadoop. Future Gener. Comput. Syst. 111, 723–740 (2020)CrossRef García-García, F., Corral, A., Iribarne, L., Vassilakopoulos, M.: Improving distance-join query processing with Voronoi-Diagram based partitioning in SpatialHadoop. Future Gener. Comput. Syst. 111, 723–740 (2020)CrossRef
15.
go back to reference Pandian, A.P.: Enhanced edge model for big data in the internet of things based applications. J. Trends Comput. Sci. Smart Technol. (TCSST) 1(1), 63–73 (2019)CrossRef Pandian, A.P.: Enhanced edge model for big data in the internet of things based applications. J. Trends Comput. Sci. Smart Technol. (TCSST) 1(1), 63–73 (2019)CrossRef
16.
go back to reference Al-Naami, K.M., Seker, S.E., Khan, L.: GISQAF: MapReduce guided spatial query processing and analytics system. Software 46(10), 1329–1349 (2016) Al-Naami, K.M., Seker, S.E., Khan, L.: GISQAF: MapReduce guided spatial query processing and analytics system. Software 46(10), 1329–1349 (2016)
17.
go back to reference Li, H., Yoo, J.: Efficient continuous skyline query processing scheme over large dynamic data sets. ETRI J. 38(6), 1197–1206 (2016)CrossRef Li, H., Yoo, J.: Efficient continuous skyline query processing scheme over large dynamic data sets. ETRI J. 38(6), 1197–1206 (2016)CrossRef
18.
go back to reference Sahal, R., Khafagy, M.H., Omara, F.A.: Exploiting coarse-grained reused-based opportunities in big data multi-query optimization. J. Comput. Sci. 26, 432–452 (2018)CrossRef Sahal, R., Khafagy, M.H., Omara, F.A.: Exploiting coarse-grained reused-based opportunities in big data multi-query optimization. J. Comput. Sci. 26, 432–452 (2018)CrossRef
19.
go back to reference Joseph, S.I.T., Thanakumar, I.: Survey of data mining algorithm’s for intelligent computing system. J. Trends Comput. Sci. Smart Technol. (TCSST) 1(1), 14–24 (2019)CrossRef Joseph, S.I.T., Thanakumar, I.: Survey of data mining algorithm’s for intelligent computing system. J. Trends Comput. Sci. Smart Technol. (TCSST) 1(1), 14–24 (2019)CrossRef
20.
go back to reference Wang, Y., Xia, Y., Fang, Q., Xu, X.: AQP++: a hybrid approximate query processing framework for generalized aggregation queries. J. Comput. Sci. 26, 419–431 (2018)MathSciNetCrossRef Wang, Y., Xia, Y., Fang, Q., Xu, X.: AQP++: a hybrid approximate query processing framework for generalized aggregation queries. J. Comput. Sci. 26, 419–431 (2018)MathSciNetCrossRef
21.
go back to reference Kim, T., Li, W., Behma, A., Cetindila, I., Vernica, R., Borkar, V., Carey, M.J., Li, C.: Similarity query support in big data management systems. Inf. Syst. 88, 10455 (2020)CrossRef Kim, T., Li, W., Behma, A., Cetindila, I., Vernica, R., Borkar, V., Carey, M.J., Li, C.: Similarity query support in big data management systems. Inf. Syst. 88, 10455 (2020)CrossRef
Metadata
Title
Prefetched wald adaptive boost classification based Czekanowski similarity MapReduce for user query processing with bigdata
Authors
S. Tamil Selvan
P. Balamurugan
M. Vijayakumar
Publication date
05-01-2021
Publisher
Springer US
Published in
Distributed and Parallel Databases / Issue 4/2021
Print ISSN: 0926-8782
Electronic ISSN: 1573-7578
DOI
https://doi.org/10.1007/s10619-020-07319-6

Other articles of this Issue 4/2021

Distributed and Parallel Databases 4/2021 Go to the issue

Premium Partner