Skip to main content
Top
Published in: International Journal of Machine Learning and Cybernetics 6/2020

18-11-2019 | Original Article

Data mining and machine learning approaches for prediction modelling of schistosomiasis disease vectors

Epidemic disease prediction modelling

Published in: International Journal of Machine Learning and Cybernetics | Issue 6/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This research presents viable solutions for prediction modelling of schistosomiasis disease based on vector density. Novel training models proposed in this work aim to address various aspects of interest in the artificial intelligence applications domain. Topics discussed include data imputation, semi-supervised labelling and synthetic instance simulation when using sparse training data. Innovative semi-supervised ensemble learning paradigms are proposed focusing on labelling threshold selection and stringency of classification confidence levels. A regression-correlation combination (RCC) data imputation method is also introduced for handling of partially complete training data. Results presented in this work show data imputation precision improvement over benchmark value replacement using proposed RCC on 70% of test cases. Proposed novel incremental transductive models such as ITSVM have provided interesting findings based on threshold constraints outperforming standard SVM application on 21% of test cases and can be applied with alternative environment-based epidemic disease domains. The proposed incremental transductive ensemble approach model enables the combination of complimentary algorithms to provide labelling for unlabelled vector density instances. Liberal (LTA) and strict training approaches provided varied results with LTA outperforming Stacking ensemble on 29.1% of test cases. Proposed novel synthetic minority over-sampling technique (SMOTE) equilibrium approach has yielded subtle classification performance increases which can be further interrogated to assess classification performance and efficiency relationships with synthetic instance generation.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Show more products
Footnotes
1
This data used in this research was provided by the European Space Agency and partners at the Academy of Opto-Electronics, Chinese Academy of Sciences, China
 
Literature
1.
go back to reference Kerr J (2003) From space to species: ecological applications for remote sensing. Trends Ecol Evol 18(6):299–305CrossRef Kerr J (2003) From space to species: ecological applications for remote sensing. Trends Ecol Evol 18(6):299–305CrossRef
2.
go back to reference Zscheischler J, Mahecha MD, Harmeling S, Reichstein M (2013) Detection and attribution of large spatio-temporal extreme events in Earth observation data. Ecol Inform 15:66–73CrossRef Zscheischler J, Mahecha MD, Harmeling S, Reichstein M (2013) Detection and attribution of large spatio-temporal extreme events in Earth observation data. Ecol Inform 15:66–73CrossRef
3.
go back to reference Bavia ME, Malone JB, Hale L, Dantas A, Marroni L, Reis R (2001) Use of thermal and vegetation index data from earth observing satellites to evaluate the risk of schistosomiasis in Bahia. Brazil Acta Tropica 79(1):79–85CrossRef Bavia ME, Malone JB, Hale L, Dantas A, Marroni L, Reis R (2001) Use of thermal and vegetation index data from earth observing satellites to evaluate the risk of schistosomiasis in Bahia. Brazil Acta Tropica 79(1):79–85CrossRef
4.
go back to reference Weng J, Xu Y, Sharma AR (2012) Epidemic analysis and Visualization based on digital earth spatio-temporal framework 2. State Key Laboratory of Remote Sensing Science, Institute of Remote Sensing Applications of Chinese Academy of Sciences, Beijing, 100101, China * Corresponding a, pp 7220–7223 Weng J, Xu Y, Sharma AR (2012) Epidemic analysis and Visualization based on digital earth spatio-temporal framework 2. State Key Laboratory of Remote Sensing Science, Institute of Remote Sensing Applications of Chinese Academy of Sciences, Beijing, 100101, China * Corresponding a, pp 7220–7223
5.
go back to reference Wu Y, Lee G, Fu X, Hung TGG (2008) Detect climatic factors contributing to dengue outbreak based on wavelet, support vector machines and genetic algorithm. Lect Notes Eng Comput Sci Wu Y, Lee G, Fu X, Hung TGG (2008) Detect climatic factors contributing to dengue outbreak based on wavelet, support vector machines and genetic algorithm. Lect Notes Eng Comput Sci
6.
go back to reference Study AC, Poyang OF, Province J (2006) Indicator development for potential presence of schistosomiasis japonicum’s vector in lake and marshland regions. Eur Space Agency (Special Publication) 1851:1 Study AC, Poyang OF, Province J (2006) Indicator development for potential presence of schistosomiasis japonicum’s vector in lake and marshland regions. Eur Space Agency (Special Publication) 1851:1
7.
go back to reference Ding X, Li X (2011) Monitoring of the water-area variations of Lake Dongting in China with ENVISAT ASAR images. Int J Appl Earth Observ Geoinform 13(6):894–901CrossRef Ding X, Li X (2011) Monitoring of the water-area variations of Lake Dongting in China with ENVISAT ASAR images. Int J Appl Earth Observ Geoinform 13(6):894–901CrossRef
8.
go back to reference Palaniyandi M, Anand PH, Maniyosai R (2014) Spatial cognition: a geospatial analysis of vector borne disease transmission and the environment, using remote sensing and GIS. Int J Mosq Res 1(3):39–54 Palaniyandi M, Anand PH, Maniyosai R (2014) Spatial cognition: a geospatial analysis of vector borne disease transmission and the environment, using remote sensing and GIS. Int J Mosq Res 1(3):39–54
9.
go back to reference Simoonga C, Utzinger J, Brooker S, Vounatsou P, Appleton CC, Stensgaard AS et al (2009) Remote sensing, geographical information system and spatial analysis for schistosomiasis epidemiology and ecology in Africa. Parasitology 136(13):1683–1693CrossRef Simoonga C, Utzinger J, Brooker S, Vounatsou P, Appleton CC, Stensgaard AS et al (2009) Remote sensing, geographical information system and spatial analysis for schistosomiasis epidemiology and ecology in Africa. Parasitology 136(13):1683–1693CrossRef
10.
go back to reference Ying L, Xinle Y, Yuezhi Z, Xiaoyu M, Fei H, Ke Y (2011) Analysis of spatial and temporal characteristics of the epidemic of schistosomiasis in Poyang Lake Region. Procedia Environ Sci 10(Esiat):2760–2768CrossRef Ying L, Xinle Y, Yuezhi Z, Xiaoyu M, Fei H, Ke Y (2011) Analysis of spatial and temporal characteristics of the epidemic of schistosomiasis in Poyang Lake Region. Procedia Environ Sci 10(Esiat):2760–2768CrossRef
12.
go back to reference Zhang Z, Ward M, Gao J, Wang Z, Yao B, Zhang T et al (2013) Remote sensing and disease control in China: past, present and future. Parasites Vectors 6(1):11CrossRef Zhang Z, Ward M, Gao J, Wang Z, Yao B, Zhang T et al (2013) Remote sensing and disease control in China: past, present and future. Parasites Vectors 6(1):11CrossRef
14.
go back to reference Walz Y, Wegmann M, Dech S, Raso G, Utzinger J (2015) Risk profiling of schistosomiasis using remote sensing: approaches, challenges and outlook. Parasites Vectors 8(1):163CrossRef Walz Y, Wegmann M, Dech S, Raso G, Utzinger J (2015) Risk profiling of schistosomiasis using remote sensing: approaches, challenges and outlook. Parasites Vectors 8(1):163CrossRef
15.
go back to reference Ding Q, Han J, Zhao X, Chen Y (2015) Missing-data classification with the extended full-dimensional Gaussian mixture model: applications to EMG-based motion recognition. IEEE Trans Ind Electron 62(8):4994–5005CrossRef Ding Q, Han J, Zhao X, Chen Y (2015) Missing-data classification with the extended full-dimensional Gaussian mixture model: applications to EMG-based motion recognition. IEEE Trans Ind Electron 62(8):4994–5005CrossRef
17.
go back to reference Fusco T, Bi Y, Wang H, Browne F (2016) Incremental transductive learning approaches to schistosomiasis vector classification. In: Dragon 4 symposium Fusco T, Bi Y, Wang H, Browne F (2016) Incremental transductive learning approaches to schistosomiasis vector classification. In: Dragon 4 symposium
18.
go back to reference Fusco T, Bi Y (2016) Medical artificial intelligence modeling (MAIM). In: Iliadis L, Maglogiannis I (eds) A Cumulative training approach to schistosomiasis vector density prediction, vol 475. Unknown host publication, pp 3–13. ISBN 978-92-9221-304-6 Fusco T, Bi Y (2016) Medical artificial intelligence modeling (MAIM). In: Iliadis L, Maglogiannis I (eds) A Cumulative training approach to schistosomiasis vector density prediction, vol 475. Unknown host publication, pp 3–13. ISBN 978-92-9221-304-6
19.
go back to reference Hosseinzadeh M, Eftekhari M (2015) Improving rotation forest performance for imbalanced data classification through fuzzy clustering. In: Proceedings of the international symposium on artificial intelligence and signal processing, AISP 2015, pp 35–40. https://doi.org/10.1109/AISP.2015.7123535 Hosseinzadeh M, Eftekhari M (2015) Improving rotation forest performance for imbalanced data classification through fuzzy clustering. In: Proceedings of the international symposium on artificial intelligence and signal processing, AISP 2015, pp 35–40. https://​doi.​org/​10.​1109/​AISP.​2015.​7123535
20.
go back to reference Li T, Yang J, Chen Z (2010) The early warning and prediction method of flea beetle based on maximum likelihood algorithm ensembles. In: Proceedings of 2010 6th international conference on natural computation, ICNC 2010, vol 4(Icnc), pp 1901–1905 Li T, Yang J, Chen Z (2010) The early warning and prediction method of flea beetle based on maximum likelihood algorithm ensembles. In: Proceedings of 2010 6th international conference on natural computation, ICNC 2010, vol 4(Icnc), pp 1901–1905
21.
22.
go back to reference Bisong HBH, Jianhua GJG (2010) Support vector machine based classification analysis of SARS spatial distribution. In: Natural computation (ICNC), 2010 sixth international conference on, vol 2(Icnc), pp 924–927 Bisong HBH, Jianhua GJG (2010) Support vector machine based classification analysis of SARS spatial distribution. In: Natural computation (ICNC), 2010 sixth international conference on, vol 2(Icnc), pp 924–927
23.
go back to reference Ostfeld RS (2009) Climate change and the distribution and intensity of infectious diseases. Ecology 90(4):903–905CrossRef Ostfeld RS (2009) Climate change and the distribution and intensity of infectious diseases. Ecology 90(4):903–905CrossRef
25.
go back to reference Iliou T, Anagnostopoulos CN, Stephanakis IM, Anastassopoulos G (2017) A novel data preprocessing method for boosting neural network performance: a case study in osteoporosis prediction. Inf Sci 380:92–100CrossRef Iliou T, Anagnostopoulos CN, Stephanakis IM, Anastassopoulos G (2017) A novel data preprocessing method for boosting neural network performance: a case study in osteoporosis prediction. Inf Sci 380:92–100CrossRef
26.
go back to reference Fathima SA, Hundewale N, Member S (2012) Comparitive analysis of machine learning techniques for classification of arbovirus. In: Proceedings of 2012 IEEE-EMBS international conference on biomedical and health informatics, vol 25(Bhi), pp 376–379 Fathima SA, Hundewale N, Member S (2012) Comparitive analysis of machine learning techniques for classification of arbovirus. In: Proceedings of 2012 IEEE-EMBS international conference on biomedical and health informatics, vol 25(Bhi), pp 376–379
27.
go back to reference Xu H (2006) Modification of normalised difference water index (NDWI) to enhance open water features in remotely sensed imagery. Int J Remote Sens 27(14):3025–3033CrossRef Xu H (2006) Modification of normalised difference water index (NDWI) to enhance open water features in remotely sensed imagery. Int J Remote Sens 27(14):3025–3033CrossRef
28.
go back to reference Bruzzone L, Chi M, Marconcini M (2006) A novel transductive SVM for semisupervised classification of remote-sensing images. IEEE Trans Geosci Remote Sens 44(11):3363–3372CrossRef Bruzzone L, Chi M, Marconcini M (2006) A novel transductive SVM for semisupervised classification of remote-sensing images. IEEE Trans Geosci Remote Sens 44(11):3363–3372CrossRef
30.
go back to reference Quinlan R (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Mateo Quinlan R (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Mateo
35.
go back to reference Yang W, Yin X, Xia GS (2015) Learning high-level features for satellite image classification with limited labeled samples. IEEE Trans Geosci Remote Sens 53(8):4472–4482CrossRef Yang W, Yin X, Xia GS (2015) Learning high-level features for satellite image classification with limited labeled samples. IEEE Trans Geosci Remote Sens 53(8):4472–4482CrossRef
36.
go back to reference Thomas P (2009) Semi-supervised learning by Olivier Chapelle, Bernhard Schölkopf, and Alexander Zien (Review). IEEE Trans Neural Netw 20:542 Thomas P (2009) Semi-supervised learning by Olivier Chapelle, Bernhard Schölkopf, and Alexander Zien (Review). IEEE Trans Neural Netw 20:542
37.
go back to reference Yu T, Zhang W (2016) Semisupervised multilabel learning with joint dimensionality reduction. IEEE Signal Process Lett 23(6):795–799CrossRef Yu T, Zhang W (2016) Semisupervised multilabel learning with joint dimensionality reduction. IEEE Signal Process Lett 23(6):795–799CrossRef
38.
41.
go back to reference Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357CrossRef Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357CrossRef
42.
go back to reference Stefanowski J, Wilk S (2008) Selective pre-processing of imbalanced data for improving classification performance. In: Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 5182 LNCS, pp 283–292 Stefanowski J, Wilk S (2008) Selective pre-processing of imbalanced data for improving classification performance. In: Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 5182 LNCS, pp 283–292
43.
go back to reference Malazizi L, Neagu D, Chaudhry Q (2008) Improving imbalanced multidimensional dataset learner performance with artificial data generation: density-based class-boost algorithm. In: Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 5077, pp 165–176 Malazizi L, Neagu D, Chaudhry Q (2008) Improving imbalanced multidimensional dataset learner performance with artificial data generation: density-based class-boost algorithm. In: Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 5077, pp 165–176
44.
go back to reference Zhang H, Liu G, Chow Tommy WS, Member, Senior and Liu, Wenyin and Member (2011) SeniorTextual and visual content-based anti-phishing: a Bayesian approach. In: IEEE transactions on neural networks, vol 22, pp 1532–1546 Zhang H, Liu G, Chow Tommy WS, Member, Senior and Liu, Wenyin and Member (2011) SeniorTextual and visual content-based anti-phishing: a Bayesian approach. In: IEEE transactions on neural networks, vol 22, pp 1532–1546
47.
go back to reference Hron K, Templ M, Filzmoser P (2019) Imputation of missing values for compositional data using classical and robust methods. Comput Stat Data Anal 54:3095–3107MathSciNetCrossRef Hron K, Templ M, Filzmoser P (2019) Imputation of missing values for compositional data using classical and robust methods. Comput Stat Data Anal 54:3095–3107MathSciNetCrossRef
Metadata
Title
Data mining and machine learning approaches for prediction modelling of schistosomiasis disease vectors
Epidemic disease prediction modelling
Publication date
18-11-2019
Published in
International Journal of Machine Learning and Cybernetics / Issue 6/2020
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-019-01029-x

Other articles of this Issue 6/2020

International Journal of Machine Learning and Cybernetics 6/2020 Go to the issue