Skip to main content
Top

2021 | OriginalPaper | Chapter

Big Data Analytics and Preprocessing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Big data is a trending word in the industry and academia that represents the huge flood of collected data, this data is very complex in its nature. Big data as a term used to describe many concepts related to the data from technological and cultural meaning. In the big data community, big data analytics is used to discover the hidden patterns and values that give an accurate representation of the data. Big data preprocessing is considered an important step in the analysis process. It a key to the success of the analysis process in terms of analysis time, utilized resources percentage, storage, the efficiency of the analyzed data and the output gained information. Preprocessing data involves dealing with concepts like concept drift, data streams that are considered as significant challenges.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Kaisler, S., Armour, F., Espinosa, J.A. Money, W.: Big data: issues and challenges moving forward. In: Proceedings of Annual Hawaii International Conference on System Sciences, pp. 995–1004 (2013) Kaisler, S., Armour, F., Espinosa, J.A. Money, W.: Big data: issues and challenges moving forward. In: Proceedings of Annual Hawaii International Conference on System Sciences, pp. 995–1004 (2013)
2.
go back to reference García, S., Ramírez-Gallego, S., Luengo, J., Benítez, J.M., Herrera, F.: Big data preprocessing: methods and prospects. Big Data Anal. 1(1), 1–22 (2016) García, S., Ramírez-Gallego, S., Luengo, J., Benítez, J.M., Herrera, F.: Big data preprocessing: methods and prospects. Big Data Anal. 1(1), 1–22 (2016)
3.
go back to reference Durak, U., Becker, J., Hartmann, S., Voros, N.S.: Big data and data analytics in aviation Gerrit. Springer International Publishing (2018) Durak, U., Becker, J., Hartmann, S., Voros, N.S.: Big data and data analytics in aviation Gerrit. Springer International Publishing (2018)
4.
go back to reference Palominos, F., Díaz, H., Cañete, L., Durán, C., Córdova, F.: A solution for problems in the organization, storage and processing of large data banks of physiological variables. Int. J. Comput. Commun. Control 12(2), 276–290 (2017) Palominos, F., Díaz, H., Cañete, L., Durán, C., Córdova, F.: A solution for problems in the organization, storage and processing of large data banks of physiological variables. Int. J. Comput. Commun. Control 12(2), 276–290 (2017)
5.
go back to reference Raghupathi, W., Raghupathi, V.: Big data analytics in healthcare: promise and potential. Heal. Inf. Sci. Syst. 2, 3 (2014) Raghupathi, W., Raghupathi, V.: Big data analytics in healthcare: promise and potential. Heal. Inf. Sci. Syst. 2, 3 (2014)
6.
go back to reference Ramírez-Gallego, S., Krawczyk, B., García, S., Woźniak, M., Herrera, F.: A survey on data preprocessing for data stream mining: Current status and future directions. Neurocomputing 239, 39–57 (2017) Ramírez-Gallego, S., Krawczyk, B., García, S., Woźniak, M., Herrera, F.: A survey on data preprocessing for data stream mining: Current status and future directions. Neurocomputing 239, 39–57 (2017)
7.
go back to reference Symon, P.B., Tarapore, A.: Defense intelligence analysis in the age of big data. JFQ Jt. Force Q. 79, 4–11 (2015) Symon, P.B., Tarapore, A.: Defense intelligence analysis in the age of big data. JFQ Jt. Force Q. 79, 4–11 (2015)
8.
go back to reference Amini, S., Gerostathopoulos, I., Prehofer, C.: Big data analytics architecture for real-time traffic control. In: 5th IEEE International Conference on Models and Technologies for Intelligent Transportation Systems, MT-ITS 2017 - Proceedings, vol. Tum Llcm, pp. 710–715 (2017) Amini, S., Gerostathopoulos, I., Prehofer, C.: Big data analytics architecture for real-time traffic control. In: 5th IEEE International Conference on Models and Technologies for Intelligent Transportation Systems, MT-ITS 2017 - Proceedings, vol. Tum Llcm, pp. 710–715 (2017)
9.
10.
go back to reference Hashem, I.A.T., Yaqoob, I., Anuar, N.B., Mokhtar, S., Gani, A., Ullah Khan, S.: The rise of ‘big data’ on cloud computing: review and open research issues. Inf. Syst. 47, 98–115 (2015) Hashem, I.A.T., Yaqoob, I., Anuar, N.B., Mokhtar, S., Gani, A., Ullah Khan, S.: The rise of ‘big data’ on cloud computing: review and open research issues. Inf. Syst. 47, 98–115 (2015)
11.
go back to reference Addo-Tenkorang, R., Helo, P.T.: Big data applications in operations/supply-chain management: a literature review. Comput. Ind. Eng. 101, 528–543 (2016) Addo-Tenkorang, R., Helo, P.T.: Big data applications in operations/supply-chain management: a literature review. Comput. Ind. Eng. 101, 528–543 (2016)
12.
go back to reference Bhadani, A.K., Jothimani, D.: Big data: challenges, opportunities, and realities. In: IGI Global International Publishing Information Science Technology Research, pp. 1–24 (2016) Bhadani, A.K., Jothimani, D.: Big data: challenges, opportunities, and realities. In: IGI Global International Publishing Information Science Technology Research, pp. 1–24 (2016)
13.
go back to reference Huda, M., et al.: Big data emerging technology: Insights into innovative environment for online learning resources. Int. J. Emerg. Technol. Learn. 13(1), 23–36 (2018) Huda, M., et al.: Big data emerging technology: Insights into innovative environment for online learning resources. Int. J. Emerg. Technol. Learn. 13(1), 23–36 (2018)
14.
go back to reference Wang, X., Zhang, Y., Leung, V.C.M., Guizani, N., Jiang, T.: Wireless big data: technologies and applications D2D big data: content deliveries over wireless device-to-device sharing in large-scale mobile networks. IEEE Wirel. Commun. 25(February), 32–38 (2018)CrossRef Wang, X., Zhang, Y., Leung, V.C.M., Guizani, N., Jiang, T.: Wireless big data: technologies and applications D2D big data: content deliveries over wireless device-to-device sharing in large-scale mobile networks. IEEE Wirel. Commun. 25(February), 32–38 (2018)CrossRef
15.
go back to reference Liao, H., Tang, M., Luo, L., Li, C., Chiclana, F., Zeng, X.J.: A bibliometric analysis and visualization of medical big data research. Sustainability 10(1), 1–18 (2018) Liao, H., Tang, M., Luo, L., Li, C., Chiclana, F., Zeng, X.J.: A bibliometric analysis and visualization of medical big data research. Sustainability 10(1), 1–18 (2018)
16.
go back to reference Manogaran, G., Vijayakumar, V., Varatharajan, R., Kumar, P.M., Sundarasekar, R., Hsu, C.H.: Machine learning based big data processing framework for cancer diagnosis using Hidden Markov Model and GM clustering. Wirel. Pers. Commun. 102(3), 2099–2116 (2018)CrossRef Manogaran, G., Vijayakumar, V., Varatharajan, R., Kumar, P.M., Sundarasekar, R., Hsu, C.H.: Machine learning based big data processing framework for cancer diagnosis using Hidden Markov Model and GM clustering. Wirel. Pers. Commun. 102(3), 2099–2116 (2018)CrossRef
17.
go back to reference Ayyad, S.M., Saleh, A.I., Labib, L.M.: A new distributed feature selection technique for classifying gene expression data. J. Biomath. 12, 1950039 (2019)MathSciNetMATHCrossRef Ayyad, S.M., Saleh, A.I., Labib, L.M.: A new distributed feature selection technique for classifying gene expression data. J. Biomath. 12, 1950039 (2019)MathSciNetMATHCrossRef
18.
go back to reference Ayyad, S.M., Saleh, A.I., Labib, L.M.: Classification techniques in gene expression microarray data. Int. J. Comput. Sci. Mob. Comput. 7(11), 52–56 (2018) Ayyad, S.M., Saleh, A.I., Labib, L.M.: Classification techniques in gene expression microarray data. Int. J. Comput. Sci. Mob. Comput. 7(11), 52–56 (2018)
19.
go back to reference Ayyad, S.M., Saleh, A.I., Labib, L.M.: Gene expression cancer classification using modified K-Nearest Neighbors technique. BioSystems 176, 41–51 (2019)CrossRef Ayyad, S.M., Saleh, A.I., Labib, L.M.: Gene expression cancer classification using modified K-Nearest Neighbors technique. BioSystems 176, 41–51 (2019)CrossRef
20.
go back to reference Negi, A., Bhatnagar, R., Parida, L.: Distributed computing and internet technology, vol. 3347, pp. 295–300 (2005) Negi, A., Bhatnagar, R., Parida, L.: Distributed computing and internet technology, vol. 3347, pp. 295–300 (2005)
21.
go back to reference Lv, Z., Song, H., Basanta-Val, P., Steed, A., Jo, M.: Next-generation big data analytics: state of the art, challenges, and future research topics. IEEE Trans. Ind. Inform. 13(4), 1891–1899 (2017) Lv, Z., Song, H., Basanta-Val, P., Steed, A., Jo, M.: Next-generation big data analytics: state of the art, challenges, and future research topics. IEEE Trans. Ind. Inform. 13(4), 1891–1899 (2017)
22.
go back to reference Tseng, F.-M., Harmon, R.: The impact of big data analytics on the dynamics of social change. Technol. Forecast. Soc. Change 130, 56 (2018) Tseng, F.-M., Harmon, R.: The impact of big data analytics on the dynamics of social change. Technol. Forecast. Soc. Change 130, 56 (2018)
23.
go back to reference Blazquez, D., Domenech, J.: Big data sources and methods for social and economic analyses. Technol. Forecast. Soc. Change 130, 99–113 (2018) Blazquez, D., Domenech, J.: Big data sources and methods for social and economic analyses. Technol. Forecast. Soc. Change 130, 99–113 (2018)
24.
go back to reference Aggarwal, V.B., Bhatnagar, V., Kumar, D. (eds.) Advances in Intelligent Systems and Computing. Big Data Analytics, vol. 654 (2015) Aggarwal, V.B., Bhatnagar, V., Kumar, D. (eds.) Advances in Intelligent Systems and Computing. Big Data Analytics, vol. 654 (2015)
25.
go back to reference Maxwell, S.E., Kelley, K., Rausch, J.R.: Sample size planning for statistical power and accuracy in parameter estimation. Annu. Rev. Psychol. 59, 537–563 (2008) Maxwell, S.E., Kelley, K., Rausch, J.R.: Sample size planning for statistical power and accuracy in parameter estimation. Annu. Rev. Psychol. 59, 537–563 (2008)
26.
go back to reference Sivarajah, U., Kamal, M.M., Irani, Z., Weerakkody, V.: Critical analysis of big data challenges and analytical methods. J. Bus. Res. 70, 263–286 (2017) Sivarajah, U., Kamal, M.M., Irani, Z., Weerakkody, V.: Critical analysis of big data challenges and analytical methods. J. Bus. Res. 70, 263–286 (2017)
27.
go back to reference Ramannavar, M., Sidnal, N.S.: A proposed contextual model for big data analysis using advanced analytics. Adv. Intell. Syst. Comput. 654, 329–339 (2018) Ramannavar, M., Sidnal, N.S.: A proposed contextual model for big data analysis using advanced analytics. Adv. Intell. Syst. Comput. 654, 329–339 (2018)
28.
go back to reference Han, H., Yonggang, W., Tat-Seng, C., Xuelong, L.: Toward scalable systems for big data analytics: a technology tutorial. IEEE Access 2, 652–687 (2014) Han, H., Yonggang, W., Tat-Seng, C., Xuelong, L.: Toward scalable systems for big data analytics: a technology tutorial. IEEE Access 2, 652–687 (2014)
29.
go back to reference Vashisht, P., Gupta, V.: Big data analytics techniques: a survey. In: Proceedings of 2015 International Conference Green Computing and Internet of Things, ICGCIoT 2015, pp. 264–269 (2016) Vashisht, P., Gupta, V.: Big data analytics techniques: a survey. In: Proceedings of 2015 International Conference Green Computing and Internet of Things, ICGCIoT 2015, pp. 264–269 (2016)
30.
go back to reference Gandomi, A., Haider, M.: Beyond the hype: big data concepts, methods, and analytics. Int. J. Inf. Manage. 35(2), 137–144 (2015) Gandomi, A., Haider, M.: Beyond the hype: big data concepts, methods, and analytics. Int. J. Inf. Manage. 35(2), 137–144 (2015)
31.
go back to reference Wang, Y., Kung, L.A., Byrd, T.A.: Big data analytics: understanding its capabilities and potential benefits for healthcare organizations. Technol. Forecast. Soc. Change 126, 3–13 (2018) Wang, Y., Kung, L.A., Byrd, T.A.: Big data analytics: understanding its capabilities and potential benefits for healthcare organizations. Technol. Forecast. Soc. Change 126, 3–13 (2018)
32.
go back to reference Dumka, A., Sah, A.: Smart ambulance system using concept of big data and internet of things. Elsevier Inc. (2018) Dumka, A., Sah, A.: Smart ambulance system using concept of big data and internet of things. Elsevier Inc. (2018)
33.
go back to reference Praveena, A., Bharathi, B.: A survey paper on big data analytics. In: 2017 International Conference on Information Communication and Embedded Systems, ICICES 2017 (2017) Praveena, A., Bharathi, B.: A survey paper on big data analytics. In: 2017 International Conference on Information Communication and Embedded Systems, ICICES 2017 (2017)
34.
go back to reference Singh, D., Reddy, C.K.: A survey on platforms for big data analytics. J. Big Data 2(1), 1–20 (2015) Singh, D., Reddy, C.K.: A survey on platforms for big data analytics. J. Big Data 2(1), 1–20 (2015)
35.
go back to reference Fu, C., Wang, X., Zhang, L., Qiao, L.: Mining algorithm for association rules in big data based on Hadoop. In: AIP Conference Proceedings, vol. 1955 (2018) Fu, C., Wang, X., Zhang, L., Qiao, L.: Mining algorithm for association rules in big data based on Hadoop. In: AIP Conference Proceedings, vol. 1955 (2018)
36.
go back to reference Acharjya, D.P., Ahmed, K.: A survey on big data analytics: challenges, open research issues and tools. Int. J. Adv. Comput. Sci. Appl. 7(2), 511–518 (2016) Acharjya, D.P., Ahmed, K.: A survey on big data analytics: challenges, open research issues and tools. Int. J. Adv. Comput. Sci. Appl. 7(2), 511–518 (2016)
38.
go back to reference Furht, B., Villanustre, F.: Big Data Technologies and Applications, vol. 2, no. 21 (2016) Furht, B., Villanustre, F.: Big Data Technologies and Applications, vol. 2, no. 21 (2016)
39.
go back to reference García, S., Luengo, J., Herrera, F.: Data Preprocessing in Data Mining, vol. 72 (2015) García, S., Luengo, J., Herrera, F.: Data Preprocessing in Data Mining, vol. 72 (2015)
40.
go back to reference García, S., Luengo, J., Herrera, F.: Data preparation basic models. Intell. Syst. Ref. Libr. 72 (2015) García, S., Luengo, J., Herrera, F.: Data preparation basic models. Intell. Syst. Ref. Libr. 72 (2015)
41.
go back to reference Russom, P.: Big data analytics - TDWI best practices report introduction to big data analytics. Tdwi Res. 1, 3–5 (2011) Russom, P.: Big data analytics - TDWI best practices report introduction to big data analytics. Tdwi Res. 1, 3–5 (2011)
42.
go back to reference Di Martino, B., Aversa, R., Cretella, G., Esposito, A., Kołodziej, J.: Big data (lost) in the cloud. Int. J. Big Data Intell. 1(1/2), 3 (2014) Di Martino, B., Aversa, R., Cretella, G., Esposito, A., Kołodziej, J.: Big data (lost) in the cloud. Int. J. Big Data Intell. 1(1/2), 3 (2014)
43.
go back to reference ur Rehman, M.H., Liew, C.S., Abbas, A., Jayaraman, P.P., Wah, T.Y., Khan, S.U.: Big data reduction methods: a survey. Data Sci. Eng. 1(4), 265–284 (2016) ur Rehman, M.H., Liew, C.S., Abbas, A., Jayaraman, P.P., Wah, T.Y., Khan, S.U.: Big data reduction methods: a survey. Data Sci. Eng. 1(4), 265–284 (2016)
44.
go back to reference García, S., Luengo, J., Herrera, F.: Tutorial on practical tips of the most influential data preprocessing algorithms in data mining. Knowl.-Based Syst. 98, 1–29 (2016) García, S., Luengo, J., Herrera, F.: Tutorial on practical tips of the most influential data preprocessing algorithms in data mining. Knowl.-Based Syst. 98, 1–29 (2016)
45.
go back to reference Mangat, V., Gupta, V., Vig, R.: Methods to investigate concept drift in big data streams. Knowl. Comput. Appl. Knowl. Manip. Process. Tech. 1, 51–74 (2018) Mangat, V., Gupta, V., Vig, R.: Methods to investigate concept drift in big data streams. Knowl. Comput. Appl. Knowl. Manip. Process. Tech. 1, 51–74 (2018)
46.
go back to reference Polikar, R.: Ensemble Machine Learning. Springer, Boston (2012) Polikar, R.: Ensemble Machine Learning. Springer, Boston (2012)
47.
go back to reference Nagendran, N., Sultana, H.P., Sarkar, A.: A comparative analysis on ensemble classifiers for concept drifting data streams. In: Soft Computing and Medical Bioinformatics. Springer, Singapore, pp 55–62 (2019) Nagendran, N., Sultana, H.P., Sarkar, A.: A comparative analysis on ensemble classifiers for concept drifting data streams. In: Soft Computing and Medical Bioinformatics. Springer, Singapore, pp 55–62 (2019)
48.
go back to reference Khamassi, I., Sayed-Mouchaweh, M., Hammami, M., Ghédira, K.: Discussion and review on evolving data streams and concept drift adapting. Evol. Syst. 9(1), 1–23 (2018) Khamassi, I., Sayed-Mouchaweh, M., Hammami, M., Ghédira, K.: Discussion and review on evolving data streams and concept drift adapting. Evol. Syst. 9(1), 1–23 (2018)
49.
go back to reference Chang, Y.S., Lin, K.M., Tsai, Y.T., Zeng, Y.R., Hung, C.X.: Big data platform for air quality analysis and prediction. In: 2018 27th Wireless and Optical Communication Conference (WOCC), pp. 1–3 (2018) Chang, Y.S., Lin, K.M., Tsai, Y.T., Zeng, Y.R., Hung, C.X.: Big data platform for air quality analysis and prediction. In: 2018 27th Wireless and Optical Communication Conference (WOCC), pp. 1–3 (2018)
50.
go back to reference Zhao, L., Chen, Z., Hu, Y., Min, G., Jiang, Z.: Distributed feature selection for efficient economic big data analysis. IEEE Trans. Big Data 4(2), 164–176 (2016) Zhao, L., Chen, Z., Hu, Y., Min, G., Jiang, Z.: Distributed feature selection for efficient economic big data analysis. IEEE Trans. Big Data 4(2), 164–176 (2016)
51.
go back to reference Ghani, N.A., Hamid, S., Hashem, I.A.T., Ahmed, E.: Social media big data analytics: a survey. Comput. Human Behav. 101, 417–428 (2019) Ghani, N.A., Hamid, S., Hashem, I.A.T., Ahmed, E.: Social media big data analytics: a survey. Comput. Human Behav. 101, 417–428 (2019)
52.
go back to reference Tayal, V., Srivastava, R.: Challenges in mining big data streams. In: Data and Communication Networks, pp. 173–183 (2019) Tayal, V., Srivastava, R.: Challenges in mining big data streams. In: Data and Communication Networks, pp. 173–183 (2019)
Metadata
Title
Big Data Analytics and Preprocessing
Authors
Noha Shehab
Mahmoud Badawy
Hesham Arafat
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-59338-4_2

Premium Partner