Abstract
Objective
To evaluate precision and recall rates for the automatic extraction of information from free-text pathology reports. To assess the impact that implementation of pattern-based methods would have on cancer registration completeness.
Method
Over 300,000 electronic pathology reports were scanned for the extraction of Gleason score, Clark level and Breslow depth, by a number of Perl routines progressively enhanced by a trial-and-error method. An additional test set of 915 reports potentially containing Gleason score was used for evaluation.
Results
Values for recall and precision of over 98 and 99%, respectively, were easily reached. Potential increase in cancer staging completeness of up to 32% was proved.
Conclusions
In cancer registration, simple pattern matching applied to free-text documents can be effectively used to improve completeness and accuracy of pathology information.
Similar content being viewed by others
References
Stevens R, Wroe C, Lord P, Goble C (2004) Ontologies in bioinformatics. In: Staab S, Studer R (eds) Handbook on ontologies. Springer, Berlin, pp 635–657
Health level 7. http://www.hl7.org/. Accessed Jan 2010
Systematized nomenclature of medicine. http://www.snomed.org/. Accessed Jan 2010
International classification of disease. ver. 10. http://www.who.int/classifications/icd/en/. Accessed Jan 2010
Collier N, Nazarenko A, Baud R, Ruch P (2006) Recent advances in natural language processing for biomedical applications. Int J Med Inform 75:413–417
Taira RK, Soderland SG, Jakobovits RM (2001) Automatic structuring of radiology free-text reports. Radiographics 21:237–245
Hotho A, Nürnberger A, Paaß G (2005) A brief survey of text mining. LDV Forum 20:19–62
Turchin A, Kolatkar NS, Grant RW, Makhni EC, Pendergrass ML, Einbinder JS (2006) Using regular expressions to abstract blood pressure and treatment intensification information from the text of physician notes. J Am Med Inform Assoc 13:691–695
Gleason DF (1977) The veteran’s administration cooperative urologic research group: histologic grading and clinical staging of prostatic carcinoma. In: Tannenbaum M (ed) Urologic pathology: the prostate. Lea and Febiger, Philadelphia, pp 171–198
Clark WHJ, From L, Bernardino EA, Mihm MC (1969) The histogenesis and biological behavior of primary human malignant melanoma of the skin. Cancer Res 14:705–726
Breslow A (1970) Thickness, cross-sectional areas and depth of invasion in the prognosis of cutaneous melanoma. Ann Surg 172:902–908
NHS Information standards board, data standards: cancer registration data set, data set change notice (2005). http://www.connectingforhealth.nhs.uk/ dscn/dscn2005/092005.pdf
NHS connecting for health. http://www.connectingforhealth.nhs.uk/. Accessed Jan 2010
Friedl JEF (1997) Mastering regular expressions. O’Reilly & Associates, Cambridge (MA)
Sobin LH, Wittekind C (2002) UICC TNM classification of malignant tumours. Wiley-Liss, New York
SEER training modules, skin cancer: melanoma. U. S. National Institutes of Health, National Cancer Institute. http://training.seer.cancer.gov/melanoma/abstract-code-stage/staging.html. Accessed 19 July 2010
Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF (2008) Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform 2008:128–144
Coden A, Savova G, Sominsky I, Tanenblatt M, Masanz J, Schuler K, Cooper J, Guan W, de Groen PC (2009) Automatically extracting cancer disease characteristics from pathology reports into a disease knowledge representation model. J Biomed Inform 42:937–949
van Leeuwen PJ, Connolly D, Napolitano G, Gavin A, Schröder FH, Roobol MJ (2009) Metastasis-free survival in screen and clinical detected prostate cancer: a comparison between the European randomized study of screening for prostate cancer and Northern Ireland. J Urol 181(4)Suppl 1: 798
Acknowledgments
The Northern Ireland Cancer Registry was funded by the Department of Health, Social Services and Public Safety Northern Ireland (DHSSPSNI), at the time this study was completed. It is now funded by the Public Health Agency. We also wish to thank Alejandra González Beltrán for her stimulating comments on this paper.
Financial support
The Northern Ireland Cancer Registry was funded by the Department of Health, Social Services and Public Safety Northern Ireland (DHSSPSNI), at the time this study was completed. It is now funded by the Public Health Agency.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Napolitano, G., Fox, C., Middleton, R. et al. Pattern-based information extraction from pathology reports for cancer registration. Cancer Causes Control 21, 1887–1894 (2010). https://doi.org/10.1007/s10552-010-9616-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10552-010-9616-4