Skip to main content
Erschienen in: Soft Computing 6/2020

09.07.2019 | Methodologies and Application

Supervised machine learning techniques and genetic optimization for occupational diseases risk prediction

verfasst von: Antonio Di Noia, Alessio Martino, Paolo Montanari, Antonello Rizzi

Erschienen in: Soft Computing | Ausgabe 6/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Workers healthcare gained a lot of attention recently as many countries are increasingly concerning about welfare. This paper faces the problem of predicting occupational disease risks by means of computational intelligence and pattern recognition techniques. Specifically, three different machine learning approaches are compared: the first one is based on the k-means algorithm, in charge to determine a set of meaningful labelled clusters as the final model. The latter two are based on fully supervised techniques, namely Support Vector Machines and K-Nearest Neighbours. Real data regarding both the worker and the workplace by mixing numerical and categorical attributes have been used for testing. The three approaches are automatically tuned by means of genetic algorithms in order to simultaneously find the optimal hyperparameters for the classification systems and the optimal ad-hoc dissimilarity measure weights in order to maximize the classification performances. Computational results show that the three approaches are rather comparable in terms of performances, but a clustering-based approach allows a deeper knowledge discovery phase, helpful for further risk assessment and forecasting.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
Any pre-screening system able to minimize false positives and/or false negatives might have a major impact on (public) costs.
 
2
An Italian version of the classification system ISCO (International Standard Classification of Occupations).
 
3
An Italian version of the classification system NACE (Nomenclature des Activités Économiques dans la Communauté Européenne).
 
4
Recall that the clustering approach also works in a One-Against-All fashion, thus the most popular cluster amongst the ones labelled as ‘sick’ (i.e. having carpal tunnel syndrome) and the most popular cluster amongst the ones labelled as ‘healthy’ (i.e. not having carpal tunnel syndrome) have been selected.
 
5
Due to the classification speed, any constraints on prediction running times can be seen as automatically satisfied anyways.
 
Literatur
Zurück zum Zitat Abualigah LM, Khader AT, Al-Betar MA (2016) Unsupervised feature selection technique based on genetic algorithm for improving the text clustering. In: 2016 7th International conference on computer science and information technology (CSIT), pp 1–6, https://doi.org/10.1109/CSIT.2016.7549453 Abualigah LM, Khader AT, Al-Betar MA (2016) Unsupervised feature selection technique based on genetic algorithm for improving the text clustering. In: 2016 7th International conference on computer science and information technology (CSIT), pp 1–6, https://​doi.​org/​10.​1109/​CSIT.​2016.​7549453
Zurück zum Zitat Alelyani S, Tang J, Liu H (2013) Feature selection for clustering: a review. Data Clust Algorithms Appl 29:110–121 Alelyani S, Tang J, Liu H (2013) Feature selection for clustering: a review. Data Clust Algorithms Appl 29:110–121
Zurück zum Zitat Bandyopadhyay S, Murthy CA, Pal SK (1995) Pattern classification with genetic algorithms. Pattern Recognit Lett 16(8):801–808CrossRef Bandyopadhyay S, Murthy CA, Pal SK (1995) Pattern classification with genetic algorithms. Pattern Recognit Lett 16(8):801–808CrossRef
Zurück zum Zitat Bergstra J, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Proceedings of the 24th international conference on neural information processing systems. Curran Associates Inc., USA, PP 2546–2554 Bergstra J, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Proceedings of the 24th international conference on neural information processing systems. Curran Associates Inc., USA, PP 2546–2554
Zurück zum Zitat Boser BE, Guyon I, Vapnik V (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the 5th annual workshop on computational learning theory, ACM, pp 144–152 Boser BE, Guyon I, Vapnik V (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the 5th annual workshop on computational learning theory, ACM, pp 144–152
Zurück zum Zitat Del Vescovo G, Livi L, Frattale Mascioli FM, Rizzi A (2014) On the problem of modeling structured data with the minsod representative. Int J Comput Theory Eng 6(1):9CrossRef Del Vescovo G, Livi L, Frattale Mascioli FM, Rizzi A (2014) On the problem of modeling structured data with the minsod representative. Int J Comput Theory Eng 6(1):9CrossRef
Zurück zum Zitat Di Noia A, Montanari P, Rizzi A (2014) Occupational diseases risk prediction by cluster analysis and genetic optimization. In: Proceedings of the international conference on evolutionary computation theory and applications: ECTA, (IJCCI 2014), INSTICC, vol 1. SciTePress, pp 68–75, https://doi.org/10.5220/0005077800680075 Di Noia A, Montanari P, Rizzi A (2014) Occupational diseases risk prediction by cluster analysis and genetic optimization. In: Proceedings of the international conference on evolutionary computation theory and applications: ECTA, (IJCCI 2014), INSTICC, vol 1. SciTePress, pp 68–75, https://​doi.​org/​10.​5220/​0005077800680075​
Zurück zum Zitat Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning, 1st edn. Addison-Wesley, BostonMATH Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning, 1st edn. Addison-Wesley, BostonMATH
Zurück zum Zitat Lessmann S, Stahlbock R, Crone SF (2005) Optimizing hyperparameters of support vector machines by genetic algorithms. In: IC-AI, pp 74–82 Lessmann S, Stahlbock R, Crone SF (2005) Optimizing hyperparameters of support vector machines by genetic algorithms. In: IC-AI, pp 74–82
Zurück zum Zitat Liu H, Tang Z, Yang Y, Weng D, Sun G, Duan Z, Chen J (2009) Identification and classification of high risk groups for coal workers’ pneumoconiosis using an artificial neural network based on occupational histories: a retrospective cohort study. BMC Public Health 9(1):366. https://doi.org/10.1186/1471-2458-9-366 CrossRef Liu H, Tang Z, Yang Y, Weng D, Sun G, Duan Z, Chen J (2009) Identification and classification of high risk groups for coal workers’ pneumoconiosis using an artificial neural network based on occupational histories: a retrospective cohort study. BMC Public Health 9(1):366. https://​doi.​org/​10.​1186/​1471-2458-9-366 CrossRef
Zurück zum Zitat Livi L, Del Vescovo G, Rizzi A (2012) Graph recognition by seriation and frequent substructures mining. In: Proceedings of the 1st international conference on pattern recognition applications and methods: ICPRAM,, INSTICC, vol 1, SciTePress, pp 186–191, https://doi.org/10.5220/0003733201860191 Livi L, Del Vescovo G, Rizzi A (2012) Graph recognition by seriation and frequent substructures mining. In: Proceedings of the 1st international conference on pattern recognition applications and methods: ICPRAM,, INSTICC, vol 1, SciTePress, pp 186–191, https://​doi.​org/​10.​5220/​0003733201860191​
Zurück zum Zitat Livi L, Del Vescovo G, Rizzi A, Frattale Mascioli FM (2014) Building pattern recognition applications with the spare library. arXiv preprint arXiv:14105263 Livi L, Del Vescovo G, Rizzi A, Frattale Mascioli FM (2014) Building pattern recognition applications with the spare library. arXiv preprint arXiv:​14105263
Zurück zum Zitat MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability: statistics, vol 1. University of California Press, Berkeley, pp 281–297 MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability: statistics, vol 1. University of California Press, Berkeley, pp 281–297
Zurück zum Zitat Martiniano A, Ferreira RP, Sassi RJ, Affonso C (2012) Application of a neuro fuzzy network in prediction of absenteeism at work. In: 2012 7th Iberian conference on information systems and technologies (CISTI), pp 1–4 Martiniano A, Ferreira RP, Sassi RJ, Affonso C (2012) Application of a neuro fuzzy network in prediction of absenteeism at work. In: 2012 7th Iberian conference on information systems and technologies (CISTI), pp 1–4
Zurück zum Zitat Martino A, Rizzi A, Frattale Mascioli FM (2017b) Efficient approaches for solving the large-scale k-medoids problem. In: Proceedings of the 9th international joint conference on computational intelligence: IJCCI,, INSTICC, vol 1. SciTePress, pp 338–347, https://doi.org/10.5220/0006515003380347 Martino A, Rizzi A, Frattale Mascioli FM (2017b) Efficient approaches for solving the large-scale k-medoids problem. In: Proceedings of the 9th international joint conference on computational intelligence: IJCCI,, INSTICC, vol 1. SciTePress, pp 338–347, https://​doi.​org/​10.​5220/​0006515003380347​
Zurück zum Zitat Martino A, Rizzi A, Frattale Mascioli FM (2019) Efficient approaches for solving the large-scale k-medoids problem: towards structured data. In: Sabourin C, Merelo J, Madani K, Warwick K (eds) Computational intelligence: 9th international joint conference, IJCCI 2017 Funchal-Madeira, Portugal, November 1–3, 2017 Revised Selected Papers. Springer International Publishing, Cham, pp 199–219. https://doi.org/10.1007/978-3-030-16469-0_11 CrossRef Martino A, Rizzi A, Frattale Mascioli FM (2019) Efficient approaches for solving the large-scale k-medoids problem: towards structured data. In: Sabourin C, Merelo J, Madani K, Warwick K (eds) Computational intelligence: 9th international joint conference, IJCCI 2017 Funchal-Madeira, Portugal, November 1–3, 2017 Revised Selected Papers. Springer International Publishing, Cham, pp 199–219. https://​doi.​org/​10.​1007/​978-3-030-16469-0_​11 CrossRef
Zurück zum Zitat Orive D, Sorrosal G, Borges C, Martín C, Alonso-Vicario A (2014) Evolutionary algorithms for hyperparameter tuning on neural networks models. In: Proceedings of the 26th european modeling & simulation symposium. Burdeos, France, pp 402–409 Orive D, Sorrosal G, Borges C, Martín C, Alonso-Vicario A (2014) Evolutionary algorithms for hyperparameter tuning on neural networks models. In: Proceedings of the 26th european modeling & simulation symposium. Burdeos, France, pp 402–409
Zurück zum Zitat Pei M, Goodman ED, Punch WF, Ding Y (1995) Genetic algorithms for classification and feature extraction. In: Classification Society Conference, pp 1–28 Pei M, Goodman ED, Punch WF, Ding Y (1995) Genetic algorithms for classification and feature extraction. In: Classification Society Conference, pp 1–28
Zurück zum Zitat Powers DMW (2011) Evaluation: from precision, recall and f-measure to roc., informedness, markedness & correlation. J Mach Learn Technol 2(1):37–63MathSciNet Powers DMW (2011) Evaluation: from precision, recall and f-measure to roc., informedness, markedness & correlation. J Mach Learn Technol 2(1):37–63MathSciNet
Zurück zum Zitat Schölkopf B, Smola AJ, Williamson RC, Bartlett PL (2000) New support vector algorithms. Neural Comput 12(5):1207–1245CrossRef Schölkopf B, Smola AJ, Williamson RC, Bartlett PL (2000) New support vector algorithms. Neural Comput 12(5):1207–1245CrossRef
Zurück zum Zitat Tsai JT, Chou JH, Liu TK (2006) Tuning the structure and parameters of a neural network by using hybrid Taguchi-genetic algorithm. IEEE Trans Neural Netw 17(1):69–80CrossRef Tsai JT, Chou JH, Liu TK (2006) Tuning the structure and parameters of a neural network by using hybrid Taguchi-genetic algorithm. IEEE Trans Neural Netw 17(1):69–80CrossRef
Zurück zum Zitat Vapnik V (1998) Statistical Learning Theory. Wiley, New YorkMATH Vapnik V (1998) Statistical Learning Theory. Wiley, New YorkMATH
Zurück zum Zitat Youden WJ (1950) Index for rating diagnostic tests. Cancer 3(1):32–35CrossRef Youden WJ (1950) Index for rating diagnostic tests. Cancer 3(1):32–35CrossRef
Zurück zum Zitat Yuan C, Li G, Peihong Z, Li C (2010) Artificial neural network modeling of prevalence of pneumoconiosis among workers in metallurgical industry—a case study. In: 2010 International conference on intelligent system design and engineering application (ISDEA), vol 1, pp 388–393, https://doi.org/10.1109/ISDEA.2010.111 Yuan C, Li G, Peihong Z, Li C (2010) Artificial neural network modeling of prevalence of pneumoconiosis among workers in metallurgical industry—a case study. In: 2010 International conference on intelligent system design and engineering application (ISDEA), vol 1, pp 388–393, https://​doi.​org/​10.​1109/​ISDEA.​2010.​111
Metadaten
Titel
Supervised machine learning techniques and genetic optimization for occupational diseases risk prediction
verfasst von
Antonio Di Noia
Alessio Martino
Paolo Montanari
Antonello Rizzi
Publikationsdatum
09.07.2019
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 6/2020
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-019-04200-2

Weitere Artikel der Ausgabe 6/2020

Soft Computing 6/2020 Zur Ausgabe

Premium Partner