skip to main content
research-article

Biomedical Ontology Quality Assurance Using a Big Data Approach

Published:24 May 2016Publication History
Skip Abstract Section

Abstract

This article presents recent progresses made in using scalable cloud computing environment, Hadoop and MapReduce, to perform ontology quality assurance (OQA), and points to areas of future opportunity. The standard sequential approach used for implementing OQA methods can take weeks if not months for exhaustive analyses for large biomedical ontological systems. With OQA methods newly implemented using massively parallel algorithms in the MapReduce framework, several orders of magnitude in speed-up can be achieved (e.g., from three months to three hours). Such dramatically reduced time makes it feasible not only to perform exhaustive structural analysis of large ontological hierarchies, but also to systematically track structural changes between versions for evolutional analysis. As an exemplar, progress is reported in using MapReduce to perform evolutional analysis and visualization on the Systemized Nomenclature of Medicine—Clinical Terms (SNOMED CT), a prominent clinical terminology system. Future opportunities in three areas are described: one is to extend the scope of MapReduce-based approach to existing OQA methods, especially for automated exhaustive structural analysis. The second is to apply our proposed MapReduce Pipeline for Lattice-based Evaluation (MaPLE) approach, demonstrated as an exemplar method for SNOMED CT, to other biomedical ontologies. The third area is to develop interfaces for reviewing results obtained by OQA methods and for visualizing ontological alignment and evolution, which can also take advantage of cloud computing technology to systematically pre-compute computationally intensive jobs in order to increase performance during user interactions with the visualization interface. Advances in these directions are expected to better support the ontological engineering lifecycle.

References

  1. Michael Ashburner, Catherine A. Ball, Judith A. Blake, David Botstein, Heather Butler, J. Michael Cherry, Allan P. Davis, Kara Dolinski, Selina S. Dwight, Janan T. Eppig, and others. 2000. Gene ontology: Tool for the unification of biology. Nature Genetics 25, 1 (2000), 25--29.Google ScholarGoogle ScholarCross RefCross Ref
  2. Olivier Bodenreider. 2004. The unified medical language system (UMLS): Integrating biomedical terminology. Nucleic Acids Research 32, suppl 1 (2004), D267--D270.Google ScholarGoogle ScholarCross RefCross Ref
  3. Olivier Bodenreider. 2008. Biomedical ontologies in action: Role in knowledge management, data integration and decision support. Yearbook of Medical Informatics (2008), 67--79.Google ScholarGoogle Scholar
  4. Olivier Bodenreider and Anita Burgun. 2009. Towards desiderata for an ontology of diseases for the annotation of biological datasets. In Proceedings of the 1st International Conference on Biomedical Ontology (ICBO 2009), Vol. 39. 42.Google ScholarGoogle ScholarCross RefCross Ref
  5. Olivier Bodenreider, Barry Smith, Anand Kumar, and Anita Burgun. 2007. Investigating subsumption in SNOMED CT: An exploration into large description logic-based biomedical terminologies. Artificial Intelligence in Medicine 39, 3 (2007), 183--195. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Werner Ceusters. 2009. Applying evolutionary terminology auditing to the gene ontology. Journal of Biomedical Informatics 42, 3 (2009), 518--529. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Werner Ceusters. 2010. Applying evolutionary terminology auditing to SNOMED CT. In AMIA Annual Symposium Proceedings, Vol. 2010. American Medical Informatics Association, Bethesda, MD, 96.Google ScholarGoogle Scholar
  8. Gene Ontology Consortium and others. 2004. The gene ontology (GO) database and informatics resource. Nucleic Acids Research 32, suppl 1 (2004), D258--D261.Google ScholarGoogle ScholarCross RefCross Ref
  9. Ronald Cornet and Ameen Abu-Hanna. 2008. Auditing description-logic-based medical terminological systems by detecting equivalent concept definitions. International Journal of Medical Informatics 77, 5 (2008), 336--345.Google ScholarGoogle ScholarCross RefCross Ref
  10. Licong Cui. 2014. Ontology-Guided Health Information Extraction, Organization, and Exploration. Ph.D. Dissertation. Case Western Reserve University.Google ScholarGoogle Scholar
  11. Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM 51, 1 (2008), 107--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Bernhard Ganter and Rudolf Wille. 1999. Formal Concept Analysis. Vol. 284. Springer, Berlin.Google ScholarGoogle Scholar
  13. Gerhard Gierz. 2003. Continuous Lattices and Domains. Number 93. Cambridge University Press, Cambridge.Google ScholarGoogle Scholar
  14. Thomas R. Gruber. 1993. A translation approach to portable ontology specifications. Knowledge Acquisition 5, 2 (1993), 199--220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Michael Hartung, Anika Groß, and Erhard Rahm. 2013. Conto--diff: Generation of complex evolution mappings for life science ontologies. Journal of Biomedical Informatics 46, 1 (2013), 15--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Michael Hartung, Toralf Kirsten, Anika Gross, and Erhard Rahm. 2009. OnEX: Exploring changes in life science ontologies. BMC Bioinformatics 10, 1 (2009), 250.Google ScholarGoogle ScholarCross RefCross Ref
  17. Catherine Jayapandian, Chien-Hung Chen, Aman Dabir, Samden Lhatoo, Guo-Qiang Zhang, and Satya S. Sahoo. 2014. Domain ontology as conceptual model for big data management: Application in biomedical informatics. In Conceptual Modeling. Springer, Berlin, 144--157.Google ScholarGoogle Scholar
  18. Catherine Praveena Jayapandian. 2014. Cloudwave: A Cloud Computing Framework for Multimodal Electrophysiological Big Data. Ph.D. Dissertation. Case Western Reserve University.Google ScholarGoogle Scholar
  19. Guoqian Jiang and Christopher G. Chute. 2009. Auditing the semantic completeness of SNOMED CT using formal concept analysis. Journal of the American Medical Informatics Association 16, 1 (2009), 89--102.Google ScholarGoogle ScholarCross RefCross Ref
  20. Cliff Joslyn. 2004. Poset ontologies and concept lattices as semantic hierarchies. In Conceptual Structures at Work. Springer, Berlin, 287--302.Google ScholarGoogle Scholar
  21. Toralf Kirsten, Anika Gross, Michael Hartung, and Erhard Rahm. 2011. GOMMA: A component-based infrastructure for managing and analyzing life science ontologies and their evolution. J. Biomedical Semantics 2, 6 (2011).Google ScholarGoogle ScholarCross RefCross Ref
  22. Lingyun Luo, José L. V. Mejino Jr, and Guo-Qiang Zhang. 2013. An analysis of FMA using structural self-bisimilarity. Journal of Biomedical Informatics 46, 3 (2013), 497--505.Google ScholarGoogle ScholarCross RefCross Ref
  23. Shawn N. Murphy, Michael E. Mendis, David A. Berkowitz, Isaac Kohane, and Henry C. Chueh. 2006. Integration of clinical and genetic data in the i2b2 architecture. In AMIA Annual Symposium Proceedings, Vol. 2006. American Medical Informatics Association, Bethesda, MD, 1040.Google ScholarGoogle Scholar
  24. Simen Myhre, Henrik Tveit, Torulf Mollestad, and Astrid Lægreid. 2006. Additional gene ontology structure for improved biological reasoning. Bioinformatics 22, 16 (2006), 2020--2027. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. M. Pisanelli. 2004. Mistakes in medical ontologies: Where do they come from and how can they be detected? Ontologies in Medicine 102 (2004), 145.Google ScholarGoogle Scholar
  26. Alan L. Rector, Sam Brandt, and Thomas Schneider. 2011. Getting the foot out of the pelvis: Modeling problems affecting use of SNOMED CT hierarchies in practical applications. Journal of the American Medical Informatics Association 18, 4 (2011), 432--440.Google ScholarGoogle ScholarCross RefCross Ref
  27. J. Rogers and A. Rector. 1996. The GALEN ontology. Medical Informatics Europe (MIE’96). IOS Press, Copenhagen, 174--178.Google ScholarGoogle Scholar
  28. Cornelius Rosse and José L. V. Mejino Jr. 2003. A reference ontology for biomedical informatics: The foundational model of anatomy. Journal of Biomedical Informatics 36, 6 (2003), 478--500. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Satya S. Sahoo, D. Brent Weatherly, Raghava Mutharaju, Pramod Anantharam, Amit Sheth, and Rick L. Tarleton. 2009. Ontology-driven provenance management in escience: An application in parasite research. In On the Move to Meaningful Internet Systems: OTM 2009. Springer, Berlin, 992--1009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Yue Wang, Michael Halper, Hua Min, Yehoshua Perl, Yan Chen, and Kent A. Spackman. 2007. Structural methodologies for auditing SNOMED. Journal of Biomedical Informatics 40, 5 (2007), 561--581.Google ScholarGoogle ScholarCross RefCross Ref
  31. Guo-Qiang Zhang. 2012. Logic of domains. Springer Science & Business Media. Springer, Berlin.Google ScholarGoogle Scholar
  32. Guo-Qiang Zhang and Olivier Bodenreider. 2010a. Large-scale, exhaustive lattice-based structural auditing of SNOMED CT. In AMIA Annual Symposium Proceedings, Vol. 2010. American Medical Informatics Association, Bethesda, MD, 922--926.Google ScholarGoogle ScholarCross RefCross Ref
  33. Guo-Qiang Zhang and Olivier Bodenreider. 2010b. Using SPARQL to test for lattices: Application to quality assurance in biomedical ontologies. In The Semantic Web--ISWC 2010. Springer, Berlin, 273--288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Guo-Qiang Zhang, Licong Cui, Samden Lhatoo, Stephan U. Schuele, and Satya Sahoo. 2014a. MEDCIS: Multi-modality epilepsy data capture and integration system. AMIA Annual Symposium Proceedings. 1248--1257.Google ScholarGoogle Scholar
  35. Guo-Qiang Zhang, Trish Siegler, Paul Saxman, Neil Sandberg, Remo Mueller, Nathan Johnson, Dale Hunscher, and Sivaram Arabandi. 2010. VISAGE: A query interface for clinical research. In AMIA Summits on Translational Science Proceedings, Vol. 2010. American Medical Informatics Association, Bethesda, MD 76--80.Google ScholarGoogle Scholar
  36. Guo-Qiang Zhang, Wei Zhu, Mengmeng Sun, Shiqiang Tao, Olivier Bodenreider, and Licong Cui. 2014b. MaPLE: A MapReduce pipeline for lattice-based evaluation and its application to SNOMED CT. In IEEE International Conference on Big Data. 754--759.Google ScholarGoogle ScholarCross RefCross Ref
  37. Songmao Zhang and Olivier Bodenreider. 2006. Law and order: Assessing and enforcing compliance with ontological modeling principles in the Foundational Model of Anatomy. Computers in Biology and Medicine 36, 7 (2006), 674--693.Google ScholarGoogle ScholarCross RefCross Ref
  38. Xinxin Zhu, Jung-Wei Fan, David M. Baorto, Chunhua Weng, and James J. Cimino. 2009. A review of auditing methods applied to the content of controlled biomedical terminologies. Journal of Biomedical Informatics 42, 3 (2009), 413--425. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Pierre Zweigenbaum, Bruno Bachimont, Jacques Bouaud, Jean Charlet, Jean-François Boisvieux, and others. 1995. Issues in the structuring and acquisition of an ontology for medical language understanding. Methods of Information in Medicine 34 (1995), 15--15.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Biomedical Ontology Quality Assurance Using a Big Data Approach

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Knowledge Discovery from Data
        ACM Transactions on Knowledge Discovery from Data  Volume 10, Issue 4
        Special Issue on SIGKDD 2014, Special Issue on BIGCHAT and Regular Papers
        July 2016
        417 pages
        ISSN:1556-4681
        EISSN:1556-472X
        DOI:10.1145/2936311
        Issue’s Table of Contents

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 24 May 2016
        • Accepted: 1 April 2015
        • Revised: 1 March 2015
        • Received: 1 October 2014
        Published in tkdd Volume 10, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader