Skip to main content
Erschienen in: Cluster Computing 1/2018

22.06.2017

A Gaussian process based big data processing framework in cluster computing environment

verfasst von: Gunasekaran Manogaran, Daphne Lopez

Erschienen in: Cluster Computing | Ausgabe 1/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Machine learning algorithms play a vital role in the prediction of an outbreak of diseases based on climate change. Dengue outbreak is caused by improper maintenance of water storages, lack of urbanization, deforestation, lack of vaccination and awareness. Moreover, a number of dengue cases are varying based on climate season. There is a need to develop the prediction model for modeling the dengue outbreak based climate change. To model the dengue outbreak, Gaussian process regression (GPR) model is applied in this paper that uses the seasonal average of various climate parameters such as maximum temperature, minimum temperature, precipitation, wind, relative humidity and solar. The number of dengue cases and climate data for each block of Tamil Nadu, India are collected from Integrated Disease Surveillance Project and Global Weather Data for SWAT Inc respectively. Local Moran’s I spatial autocorrelation is used in this paper for geographical visualization of hotspot regions. The outbreak of dengue and its hot spot regions are geographically visualized with the help of ArcGIS 10.1 software. The day wise big climate data is collected and stored in the Hadoop cluster computing environment. MapReduce framework is used to reduce the day wise climate data into seasonal climate averages such as winter, summer, and monsoon. The seasonal climate data and number of dengue incidence (health data) are integrated based on the geo-location (latitude and longitude). GPR is used to develop the prediction model for dengue based on the integrated data (climate and health data). The proposed Gaussian process based prediction model is compared with various machine learning approaches such as multiple regression, support vector machine and random forests. Experimental results demonstrate the effectiveness of our Gaussian process based prediction framework.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Tanner, L., Schreiber, M., Low, J.G., Ong, A., Tolfvenstam, T., Lai, Y.L., Ng, L.C., Leo, Y.S., Puong, L.T., Vasudevan, S.G., Simmons, C.P.: Decision tree algorithms predict the diagnosis and outcome of dengue fever in the early phase of illness. PLoS Negl Trop Dis. 2(3), e196 (2008)CrossRef Tanner, L., Schreiber, M., Low, J.G., Ong, A., Tolfvenstam, T., Lai, Y.L., Ng, L.C., Leo, Y.S., Puong, L.T., Vasudevan, S.G., Simmons, C.P.: Decision tree algorithms predict the diagnosis and outcome of dengue fever in the early phase of illness. PLoS Negl Trop Dis. 2(3), e196 (2008)CrossRef
2.
Zurück zum Zitat Gharbi, M., Quenel, P., Gustave, J., Cassadou, S., La Ruche, G., Girdary, L., Marrama, L.: Time series analysis of dengue incidence in Guadeloupe, French West Indies: forecasting models using climate variables as predictors. BMC Infect. Dis. 11(1), 1 (2011)CrossRef Gharbi, M., Quenel, P., Gustave, J., Cassadou, S., La Ruche, G., Girdary, L., Marrama, L.: Time series analysis of dengue incidence in Guadeloupe, French West Indies: forecasting models using climate variables as predictors. BMC Infect. Dis. 11(1), 1 (2011)CrossRef
3.
Zurück zum Zitat Eisen, L., Eisen, R.J.: Using geographic information systems and decision support systems for the prediction, prevention, and control of vector-borne diseases. Annu. Rev. Entomol. 7(56), 41–61 (2011)CrossRef Eisen, L., Eisen, R.J.: Using geographic information systems and decision support systems for the prediction, prevention, and control of vector-borne diseases. Annu. Rev. Entomol. 7(56), 41–61 (2011)CrossRef
4.
Zurück zum Zitat Buczak, A.L., Koshute, P.T., Babin, S.M., Feighner, B.H., Lewis, S.H.: A data-driven epidemiological prediction method for dengue outbreaks using local and remote sensing data. BMC Med. Inform. Decis. Mak. 12(1), 1 (2012)CrossRef Buczak, A.L., Koshute, P.T., Babin, S.M., Feighner, B.H., Lewis, S.H.: A data-driven epidemiological prediction method for dengue outbreaks using local and remote sensing data. BMC Med. Inform. Decis. Mak. 12(1), 1 (2012)CrossRef
5.
Zurück zum Zitat Chadwick, D., Arch, B., Wilder-Smith, A., Paton, N.: Distinguishing dengue fever from other infections on the basis of simple clinical and laboratory features: application of logistic regression analysis. J. Clin. Virol. 35(2), 147–53 (2006) Chadwick, D., Arch, B., Wilder-Smith, A., Paton, N.: Distinguishing dengue fever from other infections on the basis of simple clinical and laboratory features: application of logistic regression analysis. J. Clin. Virol. 35(2), 147–53 (2006)
6.
Zurück zum Zitat Rogers, D.J., Suk, J.E., Semenza, J.C.: Using global maps to predict the risk of dengue in Europe. Acta Trop. 31(129), 1–4 (2014)CrossRef Rogers, D.J., Suk, J.E., Semenza, J.C.: Using global maps to predict the risk of dengue in Europe. Acta Trop. 31(129), 1–4 (2014)CrossRef
7.
Zurück zum Zitat Lopez, D., Gunasekaran, M.: Assessment of vaccination strategies using fuzzy multi-criteria decision making. In: Proceedings of the Fifth International Conference on Fuzzy and Neuro Computing (FANCCO-2015), pp. 195–208. Springer, New York (2015) Lopez, D., Gunasekaran, M.: Assessment of vaccination strategies using fuzzy multi-criteria decision making. In: Proceedings of the Fifth International Conference on Fuzzy and Neuro Computing (FANCCO-2015), pp. 195–208. Springer, New York (2015)
8.
Zurück zum Zitat Lopez, D., Gunasekaran, M., Murugan, B.S., Kaur, H., Abbas, K.M.: Spatial big data analytics of influenza epidemic in Vellore, India. In: IEEE International Conference on InBig Data (Big Data), pp. 19–24 (2014) Lopez, D., Gunasekaran, M., Murugan, B.S., Kaur, H., Abbas, K.M.: Spatial big data analytics of influenza epidemic in Vellore, India. In: IEEE International Conference on InBig Data (Big Data), pp. 19–24 (2014)
9.
Zurück zum Zitat Lopez, D., Sekaran, G.: Climate change and disease dynamics—a big data perspective. Int. J. Infect. Dis. 45, 23–24 (2016)CrossRef Lopez, D., Sekaran, G.: Climate change and disease dynamics—a big data perspective. Int. J. Infect. Dis. 45, 23–24 (2016)CrossRef
10.
Zurück zum Zitat Pfeiffer, D.U., Stevens, K.B.: Spatial and temporal epidemiological analysis in the big data era. Prev. Vet. Med. 122(1), 213–20 (2015) Pfeiffer, D.U., Stevens, K.B.: Spatial and temporal epidemiological analysis in the big data era. Prev. Vet. Med. 122(1), 213–20 (2015)
11.
Zurück zum Zitat Pickard, B.R., Baynes, J., Mehaffey, M., Neale, A.C.: Translating big data into big climate ideas. Solutions 6(1), 64–73 (2015) Pickard, B.R., Baynes, J., Mehaffey, M., Neale, A.C.: Translating big data into big climate ideas. Solutions 6(1), 64–73 (2015)
12.
Zurück zum Zitat Schnase, J.L., Duffy, D.Q., Tamkin, G.S., Nadeau, D., Thompson, J.H., Grieg, C.M., McInerney, M.A., Webster, W.P.: MERRA analytic services: meeting the big data challenges of climate science through cloud-enabled climate analytics-as-a-service. Environ. Urban Syst. Comput. 61, 198–211 (2014)CrossRef Schnase, J.L., Duffy, D.Q., Tamkin, G.S., Nadeau, D., Thompson, J.H., Grieg, C.M., McInerney, M.A., Webster, W.P.: MERRA analytic services: meeting the big data challenges of climate science through cloud-enabled climate analytics-as-a-service. Environ. Urban Syst. Comput. 61, 198–211 (2014)CrossRef
13.
Zurück zum Zitat Faghmous, J.H., Kumar, V.: A big data guide to understanding climate change: The case for theory-guided data science. Big Data 2(3), 155–163 (2014) Faghmous, J.H., Kumar, V.: A big data guide to understanding climate change: The case for theory-guided data science. Big Data 2(3), 155–163 (2014)
14.
15.
Zurück zum Zitat Nativi, S., Mazzetti, P., Santoro, M., Papeschi, F., Craglia, M., Ochiai, O.: Big data challenges in building the global earth observation system of systems. Environ. Model. Softw. 30(68), 1–26 (2015)CrossRef Nativi, S., Mazzetti, P., Santoro, M., Papeschi, F., Craglia, M., Ochiai, O.: Big data challenges in building the global earth observation system of systems. Environ. Model. Softw. 30(68), 1–26 (2015)CrossRef
16.
Zurück zum Zitat Groves, P., Kayyali, B., Knott, D., Van Kuiken, S.: The ‘big data’ revolution in healthcare. McKinsey Q. (2013) Groves, P., Kayyali, B., Knott, D., Van Kuiken, S.: The ‘big data’ revolution in healthcare. McKinsey Q. (2013)
17.
Zurück zum Zitat Chawla, N.V., Davis, D.A.: Bringing big data to personalized healthcare: a patient-centered framework. J. Gen. Intern. Med. 28(3), 660–665 (2013) Chawla, N.V., Davis, D.A.: Bringing big data to personalized healthcare: a patient-centered framework. J. Gen. Intern. Med. 28(3), 660–665 (2013)
18.
Zurück zum Zitat Edlund, S.B., Davis, M.A., Kaufman, J.H.: The spatiotemporal epidemiological modeler. In: Proceedings of the 1st ACM International Health Informatics Symposium 2010 Nov 11, pp. 817–820. ACM Edlund, S.B., Davis, M.A., Kaufman, J.H.: The spatiotemporal epidemiological modeler. In: Proceedings of the 1st ACM International Health Informatics Symposium 2010 Nov 11, pp. 817–820. ACM
19.
Zurück zum Zitat Seo, S., Wallat, M., Graepel, T., Obermayer, K., Gaussian process regression: Active data selection and test point rejection. In: Mustererkennung, pp. 27–34. Springer, Berlin (2000) Seo, S., Wallat, M., Graepel, T., Obermayer, K., Gaussian process regression: Active data selection and test point rejection. In: Mustererkennung, pp. 27–34. Springer, Berlin (2000)
20.
Zurück zum Zitat Albinati, J., Meira, Jr., W., Pappa, G.L.: An accurate gaussian process-based early warning system for dengue fever. arXiv:1608.03343 (2016) Albinati, J., Meira, Jr., W., Pappa, G.L.: An accurate gaussian process-based early warning system for dengue fever. arXiv:​1608.​03343 (2016)
21.
Zurück zum Zitat Stegle, O., Fallert, S.V., MacKay, D.J., Brage, S.: Gaussian process robust regression for noisy heart rate data. IEEE Trans. Biomed. Eng. 55(9), 2143–2151 (2008) Stegle, O., Fallert, S.V., MacKay, D.J., Brage, S.: Gaussian process robust regression for noisy heart rate data. IEEE Trans. Biomed. Eng. 55(9), 2143–2151 (2008)
22.
Zurück zum Zitat Vathsangam, H., Emken, A., Spruijt-Metz, D., Sukhatme, G.S.: Toward free-living walking speed estimation using gaussian process-based regression with on-body accelerometers and gyroscopes. In: IEEE 2010 4th International Conference on Pervasive Computing Technologies for Healthcare 2010 Mar 22, pp. 1–8 Vathsangam, H., Emken, A., Spruijt-Metz, D., Sukhatme, G.S.: Toward free-living walking speed estimation using gaussian process-based regression with on-body accelerometers and gyroscopes. In: IEEE 2010 4th International Conference on Pervasive Computing Technologies for Healthcare 2010 Mar 22, pp. 1–8
23.
Zurück zum Zitat Chandola, V., Vatsavai, R.R.: A scalable gaussian process analysis algorithm for biomass monitoring. Stat. Anal. Data Min. 4(4), 430–445 (2011) Chandola, V., Vatsavai, R.R.: A scalable gaussian process analysis algorithm for biomass monitoring. Stat. Anal. Data Min. 4(4), 430–445 (2011)
24.
Zurück zum Zitat Höhle, M.: Additive-multiplicative regression models for spatio-temporal epidemics. Biom. J. 51(6), 961–978 (2009) Höhle, M.: Additive-multiplicative regression models for spatio-temporal epidemics. Biom. J. 51(6), 961–978 (2009)
25.
Zurück zum Zitat Pang, J., Liu, D., Liao, H., Peng, Y., Peng, X.: Anomaly detection based on data stream monitoring and prediction with improved Gaussian process regression algorithm. In: IEEE Conference on Prognostics and Health Management (PHM), Jun 22, pp. 1–7 (2014) Pang, J., Liu, D., Liao, H., Peng, Y., Peng, X.: Anomaly detection based on data stream monitoring and prediction with improved Gaussian process regression algorithm. In: IEEE Conference on Prognostics and Health Management (PHM), Jun 22, pp. 1–7 (2014)
26.
Zurück zum Zitat Haran, M., Bhat, K.S., Molineros, J., De Wolf, E.: Estimating the risk of a crop epidemic from coincident spatio-temporal processes. J. Agric. Biol. Environ. Stat. 15(2), 158–175 (2010) Haran, M., Bhat, K.S., Molineros, J., De Wolf, E.: Estimating the risk of a crop epidemic from coincident spatio-temporal processes. J. Agric. Biol. Environ. Stat. 15(2), 158–175 (2010)
33.
Zurück zum Zitat Victor, T. J., Malathi, M., Asokan, R., Padmanaban, P.: Laboratory-based dengue fever surveillance in Tamil Nadu, India. Indian J. Med. Res. 126(2), 112 (2007) Victor, T. J., Malathi, M., Asokan, R., Padmanaban, P.: Laboratory-based dengue fever surveillance in Tamil Nadu, India. Indian J. Med. Res. 126(2), 112 (2007)
35.
Zurück zum Zitat Manogaran, G., Thota, C., Kumar, M.V.: MetaCloudDataStorage architecture for big data security in cloud computing. Procedia Comput. Sci. 31(87), 128–133 (2016) Manogaran, G., Thota, C., Kumar, M.V.: MetaCloudDataStorage architecture for big data security in cloud computing. Procedia Comput. Sci. 31(87), 128–133 (2016)
36.
Zurück zum Zitat Manogaran, G., Thota, C., Lopez, D., Vijayakumar, V., Abbas, K.M., Sundarsekar, R.: Big data knowledge system in healthcare. In: Internet of Things and Big Data Technologies for Next Generation Healthcare 2017, pp. 133–157. Springer, Berlin Manogaran, G., Thota, C., Lopez, D., Vijayakumar, V., Abbas, K.M., Sundarsekar, R.: Big data knowledge system in healthcare. In: Internet of Things and Big Data Technologies for Next Generation Healthcare 2017, pp. 133–157. Springer, Berlin
37.
Zurück zum Zitat Manogaran, G., Lopez, D.: Disease surveillance system for big climate data processing and dengue transmission. Int. J. Ambient Comput. Intell. 8(2), 88–105 (2017)CrossRef Manogaran, G., Lopez, D.: Disease surveillance system for big climate data processing and dengue transmission. Int. J. Ambient Comput. Intell. 8(2), 88–105 (2017)CrossRef
38.
Zurück zum Zitat Gunasekaran, P., Kaveri, K., Mohana, S., Arunagiri, K., Babu, B.S., Priya, P.P., Kiruba, R., Kumar, V.S., Sheriff, A.K.: Dengue disease status in Chennai (2006–2008): a retrospective analysis. Indian J. Med. Res. 133(3), 322 (2011) Gunasekaran, P., Kaveri, K., Mohana, S., Arunagiri, K., Babu, B.S., Priya, P.P., Kiruba, R., Kumar, V.S., Sheriff, A.K.: Dengue disease status in Chennai (2006–2008): a retrospective analysis. Indian J. Med. Res. 133(3), 322 (2011)
39.
Zurück zum Zitat Bhuvaneswari, C., Raja, R., Arunagiri, K., Mohana, S., Sathiyamurthy, K., Krishnasamy, K., Gunasekaran, P.: Dengue epidemiology in Thanjavur and Trichy district, Tamilnadu-Jan 2011-Dec 2011. Indian J. Med. Sci. 65(6), 260 (2011)CrossRef Bhuvaneswari, C., Raja, R., Arunagiri, K., Mohana, S., Sathiyamurthy, K., Krishnasamy, K., Gunasekaran, P.: Dengue epidemiology in Thanjavur and Trichy district, Tamilnadu-Jan 2011-Dec 2011. Indian J. Med. Sci. 65(6), 260 (2011)CrossRef
40.
Zurück zum Zitat Anuradha, M., Dandekar, R.H., Banoo, S.: Laboratory diagnosis and incidence of Dengue virus infection: a hospital based study. Perambalur. Int. J. Biomed. Res. 5(3), 207–210 (2014) Anuradha, M., Dandekar, R.H., Banoo, S.: Laboratory diagnosis and incidence of Dengue virus infection: a hospital based study. Perambalur. Int. J. Biomed. Res. 5(3), 207–210 (2014)
41.
Zurück zum Zitat Lopez, D., Manogaran, G.: Big Data Architecture for Climate Change and Disease Dynamics. CRC Press, Boca Raton (2016) Lopez, D., Manogaran, G.: Big Data Architecture for Climate Change and Disease Dynamics. CRC Press, Boca Raton (2016)
42.
Zurück zum Zitat Thota, C., Manogaran. G., Lopez, D., Vijayakumar, V.: Big data security framework for distributed cloud data centers. In: Cybersecurity Breaches and Issues Surrounding Online Threat Protection 2017, pp. 288–310. IGI Global Thota, C., Manogaran. G., Lopez, D., Vijayakumar, V.: Big data security framework for distributed cloud data centers. In: Cybersecurity Breaches and Issues Surrounding Online Threat Protection 2017, pp. 288–310. IGI Global
43.
Zurück zum Zitat Lopez, D., Manogaran, G.: Modelling the H1N1 influenza using mathematical and neural network approaches. Biomed. Res. 28(8), 3711–3715 (2017) Lopez, D., Manogaran, G.: Modelling the H1N1 influenza using mathematical and neural network approaches. Biomed. Res. 28(8), 3711–3715 (2017)
44.
Zurück zum Zitat Manogaran, G., Thota, C., Lopez, D., Sundarasekar, R.: Big data security intelligence for healthcare industry 4.0. In: Cybersecurity for Industry 4.0: Analysis for Design and Manufacturing, vol. 3, p. 103 (2017) Manogaran, G., Thota, C., Lopez, D., Sundarasekar, R.: Big data security intelligence for healthcare industry 4.0. In: Cybersecurity for Industry 4.0: Analysis for Design and Manufacturing, vol. 3, p. 103 (2017)
46.
Zurück zum Zitat Anselin, L.: Local indicators of spatial association–LISA. Geogr. Anal. 27(2), 93–115 (1995)CrossRef Anselin, L.: Local indicators of spatial association–LISA. Geogr. Anal. 27(2), 93–115 (1995)CrossRef
47.
Zurück zum Zitat Almeida, A.S., Medronho, R.D., Valencia, L.I.: Spatial analysis of dengue and the socioeconomic context of the city of Rio de Janeiro (Southeastern Brazil). Revista de Saúde Pública. 43(4), pp. 666–673 (2009) Almeida, A.S., Medronho, R.D., Valencia, L.I.: Spatial analysis of dengue and the socioeconomic context of the city of Rio de Janeiro (Southeastern Brazil). Revista de Saúde Pública. 43(4), pp. 666–673 (2009)
48.
Zurück zum Zitat Hu, W., Clements, A., Williams, G., Tong, S.: Spatial analysis of notified dengue fever infections. Epidemiol. Infect. 139(03), 391–399 (2011) Hu, W., Clements, A., Williams, G., Tong, S.: Spatial analysis of notified dengue fever infections. Epidemiol. Infect. 139(03), 391–399 (2011)
49.
Zurück zum Zitat Fearn, T.: Gaussian process regression. NIR News 24(6), 23–24 (2013) Fearn, T.: Gaussian process regression. NIR News 24(6), 23–24 (2013)
Metadaten
Titel
A Gaussian process based big data processing framework in cluster computing environment
verfasst von
Gunasekaran Manogaran
Daphne Lopez
Publikationsdatum
22.06.2017
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 1/2018
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-017-0982-5

Weitere Artikel der Ausgabe 1/2018

Cluster Computing 1/2018 Zur Ausgabe