Skip to main content
Top
Published in: Soft Computing 6/2020

09-07-2019 | Methodologies and Application

Supervised machine learning techniques and genetic optimization for occupational diseases risk prediction

Authors: Antonio Di Noia, Alessio Martino, Paolo Montanari, Antonello Rizzi

Published in: Soft Computing | Issue 6/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Workers healthcare gained a lot of attention recently as many countries are increasingly concerning about welfare. This paper faces the problem of predicting occupational disease risks by means of computational intelligence and pattern recognition techniques. Specifically, three different machine learning approaches are compared: the first one is based on the k-means algorithm, in charge to determine a set of meaningful labelled clusters as the final model. The latter two are based on fully supervised techniques, namely Support Vector Machines and K-Nearest Neighbours. Real data regarding both the worker and the workplace by mixing numerical and categorical attributes have been used for testing. The three approaches are automatically tuned by means of genetic algorithms in order to simultaneously find the optimal hyperparameters for the classification systems and the optimal ad-hoc dissimilarity measure weights in order to maximize the classification performances. Computational results show that the three approaches are rather comparable in terms of performances, but a clustering-based approach allows a deeper knowledge discovery phase, helpful for further risk assessment and forecasting.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
Any pre-screening system able to minimize false positives and/or false negatives might have a major impact on (public) costs.
 
2
An Italian version of the classification system ISCO (International Standard Classification of Occupations).
 
3
An Italian version of the classification system NACE (Nomenclature des Activités Économiques dans la Communauté Européenne).
 
4
Recall that the clustering approach also works in a One-Against-All fashion, thus the most popular cluster amongst the ones labelled as ‘sick’ (i.e. having carpal tunnel syndrome) and the most popular cluster amongst the ones labelled as ‘healthy’ (i.e. not having carpal tunnel syndrome) have been selected.
 
5
Due to the classification speed, any constraints on prediction running times can be seen as automatically satisfied anyways.
 
Literature
go back to reference Abualigah LM, Khader AT, Al-Betar MA (2016) Unsupervised feature selection technique based on genetic algorithm for improving the text clustering. In: 2016 7th International conference on computer science and information technology (CSIT), pp 1–6, https://doi.org/10.1109/CSIT.2016.7549453 Abualigah LM, Khader AT, Al-Betar MA (2016) Unsupervised feature selection technique based on genetic algorithm for improving the text clustering. In: 2016 7th International conference on computer science and information technology (CSIT), pp 1–6, https://​doi.​org/​10.​1109/​CSIT.​2016.​7549453
go back to reference Alelyani S, Tang J, Liu H (2013) Feature selection for clustering: a review. Data Clust Algorithms Appl 29:110–121 Alelyani S, Tang J, Liu H (2013) Feature selection for clustering: a review. Data Clust Algorithms Appl 29:110–121
go back to reference Bandyopadhyay S, Murthy CA, Pal SK (1995) Pattern classification with genetic algorithms. Pattern Recognit Lett 16(8):801–808CrossRef Bandyopadhyay S, Murthy CA, Pal SK (1995) Pattern classification with genetic algorithms. Pattern Recognit Lett 16(8):801–808CrossRef
go back to reference Bergstra J, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Proceedings of the 24th international conference on neural information processing systems. Curran Associates Inc., USA, PP 2546–2554 Bergstra J, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Proceedings of the 24th international conference on neural information processing systems. Curran Associates Inc., USA, PP 2546–2554
go back to reference Boser BE, Guyon I, Vapnik V (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the 5th annual workshop on computational learning theory, ACM, pp 144–152 Boser BE, Guyon I, Vapnik V (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the 5th annual workshop on computational learning theory, ACM, pp 144–152
go back to reference Del Vescovo G, Livi L, Frattale Mascioli FM, Rizzi A (2014) On the problem of modeling structured data with the minsod representative. Int J Comput Theory Eng 6(1):9CrossRef Del Vescovo G, Livi L, Frattale Mascioli FM, Rizzi A (2014) On the problem of modeling structured data with the minsod representative. Int J Comput Theory Eng 6(1):9CrossRef
go back to reference Di Noia A, Montanari P, Rizzi A (2014) Occupational diseases risk prediction by cluster analysis and genetic optimization. In: Proceedings of the international conference on evolutionary computation theory and applications: ECTA, (IJCCI 2014), INSTICC, vol 1. SciTePress, pp 68–75, https://doi.org/10.5220/0005077800680075 Di Noia A, Montanari P, Rizzi A (2014) Occupational diseases risk prediction by cluster analysis and genetic optimization. In: Proceedings of the international conference on evolutionary computation theory and applications: ECTA, (IJCCI 2014), INSTICC, vol 1. SciTePress, pp 68–75, https://​doi.​org/​10.​5220/​0005077800680075​
go back to reference Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning, 1st edn. Addison-Wesley, BostonMATH Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning, 1st edn. Addison-Wesley, BostonMATH
go back to reference Lessmann S, Stahlbock R, Crone SF (2005) Optimizing hyperparameters of support vector machines by genetic algorithms. In: IC-AI, pp 74–82 Lessmann S, Stahlbock R, Crone SF (2005) Optimizing hyperparameters of support vector machines by genetic algorithms. In: IC-AI, pp 74–82
go back to reference Liu H, Tang Z, Yang Y, Weng D, Sun G, Duan Z, Chen J (2009) Identification and classification of high risk groups for coal workers’ pneumoconiosis using an artificial neural network based on occupational histories: a retrospective cohort study. BMC Public Health 9(1):366. https://doi.org/10.1186/1471-2458-9-366 CrossRef Liu H, Tang Z, Yang Y, Weng D, Sun G, Duan Z, Chen J (2009) Identification and classification of high risk groups for coal workers’ pneumoconiosis using an artificial neural network based on occupational histories: a retrospective cohort study. BMC Public Health 9(1):366. https://​doi.​org/​10.​1186/​1471-2458-9-366 CrossRef
go back to reference Livi L, Del Vescovo G, Rizzi A (2012) Graph recognition by seriation and frequent substructures mining. In: Proceedings of the 1st international conference on pattern recognition applications and methods: ICPRAM,, INSTICC, vol 1, SciTePress, pp 186–191, https://doi.org/10.5220/0003733201860191 Livi L, Del Vescovo G, Rizzi A (2012) Graph recognition by seriation and frequent substructures mining. In: Proceedings of the 1st international conference on pattern recognition applications and methods: ICPRAM,, INSTICC, vol 1, SciTePress, pp 186–191, https://​doi.​org/​10.​5220/​0003733201860191​
go back to reference Livi L, Del Vescovo G, Rizzi A, Frattale Mascioli FM (2014) Building pattern recognition applications with the spare library. arXiv preprint arXiv:14105263 Livi L, Del Vescovo G, Rizzi A, Frattale Mascioli FM (2014) Building pattern recognition applications with the spare library. arXiv preprint arXiv:​14105263
go back to reference MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability: statistics, vol 1. University of California Press, Berkeley, pp 281–297 MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability: statistics, vol 1. University of California Press, Berkeley, pp 281–297
go back to reference Martiniano A, Ferreira RP, Sassi RJ, Affonso C (2012) Application of a neuro fuzzy network in prediction of absenteeism at work. In: 2012 7th Iberian conference on information systems and technologies (CISTI), pp 1–4 Martiniano A, Ferreira RP, Sassi RJ, Affonso C (2012) Application of a neuro fuzzy network in prediction of absenteeism at work. In: 2012 7th Iberian conference on information systems and technologies (CISTI), pp 1–4
go back to reference Martino A, Rizzi A, Frattale Mascioli FM (2017b) Efficient approaches for solving the large-scale k-medoids problem. In: Proceedings of the 9th international joint conference on computational intelligence: IJCCI,, INSTICC, vol 1. SciTePress, pp 338–347, https://doi.org/10.5220/0006515003380347 Martino A, Rizzi A, Frattale Mascioli FM (2017b) Efficient approaches for solving the large-scale k-medoids problem. In: Proceedings of the 9th international joint conference on computational intelligence: IJCCI,, INSTICC, vol 1. SciTePress, pp 338–347, https://​doi.​org/​10.​5220/​0006515003380347​
go back to reference Martino A, Rizzi A, Frattale Mascioli FM (2019) Efficient approaches for solving the large-scale k-medoids problem: towards structured data. In: Sabourin C, Merelo J, Madani K, Warwick K (eds) Computational intelligence: 9th international joint conference, IJCCI 2017 Funchal-Madeira, Portugal, November 1–3, 2017 Revised Selected Papers. Springer International Publishing, Cham, pp 199–219. https://doi.org/10.1007/978-3-030-16469-0_11 CrossRef Martino A, Rizzi A, Frattale Mascioli FM (2019) Efficient approaches for solving the large-scale k-medoids problem: towards structured data. In: Sabourin C, Merelo J, Madani K, Warwick K (eds) Computational intelligence: 9th international joint conference, IJCCI 2017 Funchal-Madeira, Portugal, November 1–3, 2017 Revised Selected Papers. Springer International Publishing, Cham, pp 199–219. https://​doi.​org/​10.​1007/​978-3-030-16469-0_​11 CrossRef
go back to reference Orive D, Sorrosal G, Borges C, Martín C, Alonso-Vicario A (2014) Evolutionary algorithms for hyperparameter tuning on neural networks models. In: Proceedings of the 26th european modeling & simulation symposium. Burdeos, France, pp 402–409 Orive D, Sorrosal G, Borges C, Martín C, Alonso-Vicario A (2014) Evolutionary algorithms for hyperparameter tuning on neural networks models. In: Proceedings of the 26th european modeling & simulation symposium. Burdeos, France, pp 402–409
go back to reference Pei M, Goodman ED, Punch WF, Ding Y (1995) Genetic algorithms for classification and feature extraction. In: Classification Society Conference, pp 1–28 Pei M, Goodman ED, Punch WF, Ding Y (1995) Genetic algorithms for classification and feature extraction. In: Classification Society Conference, pp 1–28
go back to reference Powers DMW (2011) Evaluation: from precision, recall and f-measure to roc., informedness, markedness & correlation. J Mach Learn Technol 2(1):37–63MathSciNet Powers DMW (2011) Evaluation: from precision, recall and f-measure to roc., informedness, markedness & correlation. J Mach Learn Technol 2(1):37–63MathSciNet
go back to reference Schölkopf B, Smola AJ, Williamson RC, Bartlett PL (2000) New support vector algorithms. Neural Comput 12(5):1207–1245CrossRef Schölkopf B, Smola AJ, Williamson RC, Bartlett PL (2000) New support vector algorithms. Neural Comput 12(5):1207–1245CrossRef
go back to reference Tsai JT, Chou JH, Liu TK (2006) Tuning the structure and parameters of a neural network by using hybrid Taguchi-genetic algorithm. IEEE Trans Neural Netw 17(1):69–80CrossRef Tsai JT, Chou JH, Liu TK (2006) Tuning the structure and parameters of a neural network by using hybrid Taguchi-genetic algorithm. IEEE Trans Neural Netw 17(1):69–80CrossRef
go back to reference Vapnik V (1998) Statistical Learning Theory. Wiley, New YorkMATH Vapnik V (1998) Statistical Learning Theory. Wiley, New YorkMATH
go back to reference Yuan C, Li G, Peihong Z, Li C (2010) Artificial neural network modeling of prevalence of pneumoconiosis among workers in metallurgical industry—a case study. In: 2010 International conference on intelligent system design and engineering application (ISDEA), vol 1, pp 388–393, https://doi.org/10.1109/ISDEA.2010.111 Yuan C, Li G, Peihong Z, Li C (2010) Artificial neural network modeling of prevalence of pneumoconiosis among workers in metallurgical industry—a case study. In: 2010 International conference on intelligent system design and engineering application (ISDEA), vol 1, pp 388–393, https://​doi.​org/​10.​1109/​ISDEA.​2010.​111
Metadata
Title
Supervised machine learning techniques and genetic optimization for occupational diseases risk prediction
Authors
Antonio Di Noia
Alessio Martino
Paolo Montanari
Antonello Rizzi
Publication date
09-07-2019
Publisher
Springer Berlin Heidelberg
Published in
Soft Computing / Issue 6/2020
Print ISSN: 1432-7643
Electronic ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-019-04200-2

Other articles of this Issue 6/2020

Soft Computing 6/2020 Go to the issue

Premium Partner