Skip to main content

2020 | OriginalPaper | Buchkapitel

Predicting Multiple ICD-10 Codes from Brazilian-Portuguese Clinical Notes

verfasst von : Arthur D. Reys, Danilo Silva, Daniel Severo, Saulo Pedro, Marcia M. de Sousa e Sá, Guilherme A. C. Salgado

Erschienen in: Intelligent Systems

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

ICD coding from electronic clinical records is a manual, time-consuming and expensive process. Code assignment is, however, an important task for billing purposes and database organization. While many works have studied the problem of automated ICD coding from free text using machine learning techniques, most use records in the English language, especially from the MIMIC-III public dataset. This work presents results for a dataset with Brazilian Portuguese clinical notes. We develop and optimize a Logistic Regression model, a Convolutional Neural Network (CNN), a Gated Recurrent Unit Neural Network and a CNN with Attention (CNN-Att) for prediction of diagnosis ICD codes. We also report our results for the MIMIC-III dataset, which outperform previous work among models of the same families, as well as the state of the art. Compared to MIMIC-III, the Brazilian Portuguese dataset contains far fewer words per document, when only discharge summaries are used. We experiment concatenating additional documents available in this dataset, achieving a great boost in performance. The CNN-Att model achieves the best results on both datasets, with micro-averaged F1 score of 0.537 on MIMIC-III and 0.485 on our dataset with additional documents.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ayyar, S.: Bear don’t walk IV, O.: Tagging patient notes with ICD-9 Codes. In: Proceedings of the 29th NIPS (2016) Ayyar, S.: Bear don’t walk IV, O.: Tagging patient notes with ICD-9 Codes. In: Proceedings of the 29th NIPS (2016)
2.
Zurück zum Zitat Baumel, T., et al.: Multi-label classification of patient notes a case study on ICD code assignment. In: AAAI Workshops (2017) Baumel, T., et al.: Multi-label classification of patient notes a case study on ICD code assignment. In: AAAI Workshops (2017)
3.
Zurück zum Zitat Bojanowski, P., et al.: Enriching word vectors with subword information. TACS 5, 135–146 (2016) Bojanowski, P., et al.: Enriching word vectors with subword information. TACS 5, 135–146 (2016)
4.
Zurück zum Zitat Chung, J., et al.: Empirical evaluation of gated recurrent neural networks on sequence modeling. In: Proceedings of the NIPS 2014 Workshop on Deep Learning (2014) Chung, J., et al.: Empirical evaluation of gated recurrent neural networks on sequence modeling. In: Proceedings of the NIPS 2014 Workshop on Deep Learning (2014)
7.
Zurück zum Zitat Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the NAACL-HLT 2019 (2019) Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the NAACL-HLT 2019 (2019)
11.
Zurück zum Zitat Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd ICML, vol. 37 (2015) Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd ICML, vol. 37 (2015)
15.
Zurück zum Zitat Larkey, L.S., Croft, W.B.: Automatic assignment of ICD9 codes to discharge summaries. Tech. rep. University of Massachusetts, Amherst, MA (1995) Larkey, L.S., Croft, W.B.: Automatic assignment of ICD9 codes to discharge summaries. Tech. rep. University of Massachusetts, Amherst, MA (1995)
16.
Zurück zum Zitat Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st ICML (2014) Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st ICML (2014)
17.
18.
Zurück zum Zitat Li, F., Yu, H.: ICD coding from clinical text using multi-filter residual convolutional neural network. In: Proceedings of he 34th AAAI Conference on Artificial Intelligence (2020) Li, F., Yu, H.: ICD coding from clinical text using multi-filter residual convolutional neural network. In: Proceedings of he 34th AAAI Conference on Artificial Intelligence (2020)
20.
Zurück zum Zitat Medori, J., Fairon, C.: Machine learning and features selection for semi-automatic ICD-9-CM encoding. In: Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents, Los Angeles, California, USA, pp. 84–89. Association for Computational Linguistics, June 2010 Medori, J., Fairon, C.: Machine learning and features selection for semi-automatic ICD-9-CM encoding. In: Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents, Los Angeles, California, USA, pp. 84–89. Association for Computational Linguistics, June 2010
21.
Zurück zum Zitat Mikolov, T., et al.: Efficient estimation of word representations in vector space. In: Proceedings of the ICLR Workshop (2013) Mikolov, T., et al.: Efficient estimation of word representations in vector space. In: Proceedings of the ICLR Workshop (2013)
23.
Zurück zum Zitat Oleynik, M., Patrão, D.F.C., Finger, M.: Automated classification of semi-structured pathology reports into ICD-O using SVM in Portuguese. Stud. Health Technol. Inform. 235, 256–260 (2017) Oleynik, M., Patrão, D.F.C., Finger, M.: Automated classification of semi-structured pathology reports into ICD-O using SVM in Portuguese. Stud. Health Technol. Inform. 235, 256–260 (2017)
24.
Zurück zum Zitat WHO Organization: International Classification of Diseases: [9th] Ninth Revision, Basic Tabulation List with Alphabetic Index. World Health Organization (1978) WHO Organization: International Classification of Diseases: [9th] Ninth Revision, Basic Tabulation List with Alphabetic Index. World Health Organization (1978)
25.
Zurück zum Zitat WHO Organization: ICD-10: international statistical classification of diseases and related health problems: tenth revision. World Health Organization (2004) WHO Organization: ICD-10: international statistical classification of diseases and related health problems: tenth revision. World Health Organization (2004)
30.
Zurück zum Zitat Ruch, P., et al.: From episodes of care to diagnosis codes: automatic text categorization for medico-economic encoding. In: Proceedings of the AMIA Annual Symposium, pp. 636–640 (2008) Ruch, P., et al.: From episodes of care to diagnosis codes: automatic text categorization for medico-economic encoding. In: Proceedings of the AMIA Annual Symposium, pp. 636–640 (2008)
32.
Zurück zum Zitat dos Santos, A.B.V., Gumiel, Y.B., Carvalho, D.R.: Using deep convolutional neural networks with self-taught word embeddings to perform clinical coding. Iberoamerican J. Appl. Comput. 8, 10–27 (2018) dos Santos, A.B.V., Gumiel, Y.B., Carvalho, D.R.: Using deep convolutional neural networks with self-taught word embeddings to perform clinical coding. Iberoamerican J. Appl. Comput. 8, 10–27 (2018)
34.
Zurück zum Zitat Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st NIPS, Long Beach, California, USA, pp. 6000–6010. Curran Associates Inc. (2017) Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st NIPS, Long Beach, California, USA, pp. 6000–6010. Curran Associates Inc. (2017)
36.
Zurück zum Zitat Xu, K., et al.: Multimodal machine learning for automated ICD coding. In: Proceedings of the 4th Machine Learning for Healthcare Conference (2019) Xu, K., et al.: Multimodal machine learning for automated ICD coding. In: Proceedings of the 4th Machine Learning for Healthcare Conference (2019)
37.
Zurück zum Zitat Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Proceedings of the 28th NIPS, vol. 1, pp. 649–657 (2015) Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Proceedings of the 28th NIPS, vol. 1, pp. 649–657 (2015)
Metadaten
Titel
Predicting Multiple ICD-10 Codes from Brazilian-Portuguese Clinical Notes
verfasst von
Arthur D. Reys
Danilo Silva
Daniel Severo
Saulo Pedro
Marcia M. de Sousa e Sá
Guilherme A. C. Salgado
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-61377-8_39

Premium Partner