Abstract
This article presents recent progresses made in using scalable cloud computing environment, Hadoop and MapReduce, to perform ontology quality assurance (OQA), and points to areas of future opportunity. The standard sequential approach used for implementing OQA methods can take weeks if not months for exhaustive analyses for large biomedical ontological systems. With OQA methods newly implemented using massively parallel algorithms in the MapReduce framework, several orders of magnitude in speed-up can be achieved (e.g., from three months to three hours). Such dramatically reduced time makes it feasible not only to perform exhaustive structural analysis of large ontological hierarchies, but also to systematically track structural changes between versions for evolutional analysis. As an exemplar, progress is reported in using MapReduce to perform evolutional analysis and visualization on the Systemized Nomenclature of Medicine—Clinical Terms (SNOMED CT), a prominent clinical terminology system. Future opportunities in three areas are described: one is to extend the scope of MapReduce-based approach to existing OQA methods, especially for automated exhaustive structural analysis. The second is to apply our proposed MapReduce Pipeline for Lattice-based Evaluation (MaPLE) approach, demonstrated as an exemplar method for SNOMED CT, to other biomedical ontologies. The third area is to develop interfaces for reviewing results obtained by OQA methods and for visualizing ontological alignment and evolution, which can also take advantage of cloud computing technology to systematically pre-compute computationally intensive jobs in order to increase performance during user interactions with the visualization interface. Advances in these directions are expected to better support the ontological engineering lifecycle.
- Michael Ashburner, Catherine A. Ball, Judith A. Blake, David Botstein, Heather Butler, J. Michael Cherry, Allan P. Davis, Kara Dolinski, Selina S. Dwight, Janan T. Eppig, and others. 2000. Gene ontology: Tool for the unification of biology. Nature Genetics 25, 1 (2000), 25--29.Google ScholarCross Ref
- Olivier Bodenreider. 2004. The unified medical language system (UMLS): Integrating biomedical terminology. Nucleic Acids Research 32, suppl 1 (2004), D267--D270.Google ScholarCross Ref
- Olivier Bodenreider. 2008. Biomedical ontologies in action: Role in knowledge management, data integration and decision support. Yearbook of Medical Informatics (2008), 67--79.Google Scholar
- Olivier Bodenreider and Anita Burgun. 2009. Towards desiderata for an ontology of diseases for the annotation of biological datasets. In Proceedings of the 1st International Conference on Biomedical Ontology (ICBO 2009), Vol. 39. 42.Google ScholarCross Ref
- Olivier Bodenreider, Barry Smith, Anand Kumar, and Anita Burgun. 2007. Investigating subsumption in SNOMED CT: An exploration into large description logic-based biomedical terminologies. Artificial Intelligence in Medicine 39, 3 (2007), 183--195. Google ScholarDigital Library
- Werner Ceusters. 2009. Applying evolutionary terminology auditing to the gene ontology. Journal of Biomedical Informatics 42, 3 (2009), 518--529. Google ScholarDigital Library
- Werner Ceusters. 2010. Applying evolutionary terminology auditing to SNOMED CT. In AMIA Annual Symposium Proceedings, Vol. 2010. American Medical Informatics Association, Bethesda, MD, 96.Google Scholar
- Gene Ontology Consortium and others. 2004. The gene ontology (GO) database and informatics resource. Nucleic Acids Research 32, suppl 1 (2004), D258--D261.Google ScholarCross Ref
- Ronald Cornet and Ameen Abu-Hanna. 2008. Auditing description-logic-based medical terminological systems by detecting equivalent concept definitions. International Journal of Medical Informatics 77, 5 (2008), 336--345.Google ScholarCross Ref
- Licong Cui. 2014. Ontology-Guided Health Information Extraction, Organization, and Exploration. Ph.D. Dissertation. Case Western Reserve University.Google Scholar
- Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM 51, 1 (2008), 107--113. Google ScholarDigital Library
- Bernhard Ganter and Rudolf Wille. 1999. Formal Concept Analysis. Vol. 284. Springer, Berlin.Google Scholar
- Gerhard Gierz. 2003. Continuous Lattices and Domains. Number 93. Cambridge University Press, Cambridge.Google Scholar
- Thomas R. Gruber. 1993. A translation approach to portable ontology specifications. Knowledge Acquisition 5, 2 (1993), 199--220. Google ScholarDigital Library
- Michael Hartung, Anika Groß, and Erhard Rahm. 2013. Conto--diff: Generation of complex evolution mappings for life science ontologies. Journal of Biomedical Informatics 46, 1 (2013), 15--32. Google ScholarDigital Library
- Michael Hartung, Toralf Kirsten, Anika Gross, and Erhard Rahm. 2009. OnEX: Exploring changes in life science ontologies. BMC Bioinformatics 10, 1 (2009), 250.Google ScholarCross Ref
- Catherine Jayapandian, Chien-Hung Chen, Aman Dabir, Samden Lhatoo, Guo-Qiang Zhang, and Satya S. Sahoo. 2014. Domain ontology as conceptual model for big data management: Application in biomedical informatics. In Conceptual Modeling. Springer, Berlin, 144--157.Google Scholar
- Catherine Praveena Jayapandian. 2014. Cloudwave: A Cloud Computing Framework for Multimodal Electrophysiological Big Data. Ph.D. Dissertation. Case Western Reserve University.Google Scholar
- Guoqian Jiang and Christopher G. Chute. 2009. Auditing the semantic completeness of SNOMED CT using formal concept analysis. Journal of the American Medical Informatics Association 16, 1 (2009), 89--102.Google ScholarCross Ref
- Cliff Joslyn. 2004. Poset ontologies and concept lattices as semantic hierarchies. In Conceptual Structures at Work. Springer, Berlin, 287--302.Google Scholar
- Toralf Kirsten, Anika Gross, Michael Hartung, and Erhard Rahm. 2011. GOMMA: A component-based infrastructure for managing and analyzing life science ontologies and their evolution. J. Biomedical Semantics 2, 6 (2011).Google ScholarCross Ref
- Lingyun Luo, José L. V. Mejino Jr, and Guo-Qiang Zhang. 2013. An analysis of FMA using structural self-bisimilarity. Journal of Biomedical Informatics 46, 3 (2013), 497--505.Google ScholarCross Ref
- Shawn N. Murphy, Michael E. Mendis, David A. Berkowitz, Isaac Kohane, and Henry C. Chueh. 2006. Integration of clinical and genetic data in the i2b2 architecture. In AMIA Annual Symposium Proceedings, Vol. 2006. American Medical Informatics Association, Bethesda, MD, 1040.Google Scholar
- Simen Myhre, Henrik Tveit, Torulf Mollestad, and Astrid Lægreid. 2006. Additional gene ontology structure for improved biological reasoning. Bioinformatics 22, 16 (2006), 2020--2027. Google ScholarDigital Library
- D. M. Pisanelli. 2004. Mistakes in medical ontologies: Where do they come from and how can they be detected? Ontologies in Medicine 102 (2004), 145.Google Scholar
- Alan L. Rector, Sam Brandt, and Thomas Schneider. 2011. Getting the foot out of the pelvis: Modeling problems affecting use of SNOMED CT hierarchies in practical applications. Journal of the American Medical Informatics Association 18, 4 (2011), 432--440.Google ScholarCross Ref
- J. Rogers and A. Rector. 1996. The GALEN ontology. Medical Informatics Europe (MIE’96). IOS Press, Copenhagen, 174--178.Google Scholar
- Cornelius Rosse and José L. V. Mejino Jr. 2003. A reference ontology for biomedical informatics: The foundational model of anatomy. Journal of Biomedical Informatics 36, 6 (2003), 478--500. Google ScholarDigital Library
- Satya S. Sahoo, D. Brent Weatherly, Raghava Mutharaju, Pramod Anantharam, Amit Sheth, and Rick L. Tarleton. 2009. Ontology-driven provenance management in escience: An application in parasite research. In On the Move to Meaningful Internet Systems: OTM 2009. Springer, Berlin, 992--1009. Google ScholarDigital Library
- Yue Wang, Michael Halper, Hua Min, Yehoshua Perl, Yan Chen, and Kent A. Spackman. 2007. Structural methodologies for auditing SNOMED. Journal of Biomedical Informatics 40, 5 (2007), 561--581.Google ScholarCross Ref
- Guo-Qiang Zhang. 2012. Logic of domains. Springer Science & Business Media. Springer, Berlin.Google Scholar
- Guo-Qiang Zhang and Olivier Bodenreider. 2010a. Large-scale, exhaustive lattice-based structural auditing of SNOMED CT. In AMIA Annual Symposium Proceedings, Vol. 2010. American Medical Informatics Association, Bethesda, MD, 922--926.Google ScholarCross Ref
- Guo-Qiang Zhang and Olivier Bodenreider. 2010b. Using SPARQL to test for lattices: Application to quality assurance in biomedical ontologies. In The Semantic Web--ISWC 2010. Springer, Berlin, 273--288. Google ScholarDigital Library
- Guo-Qiang Zhang, Licong Cui, Samden Lhatoo, Stephan U. Schuele, and Satya Sahoo. 2014a. MEDCIS: Multi-modality epilepsy data capture and integration system. AMIA Annual Symposium Proceedings. 1248--1257.Google Scholar
- Guo-Qiang Zhang, Trish Siegler, Paul Saxman, Neil Sandberg, Remo Mueller, Nathan Johnson, Dale Hunscher, and Sivaram Arabandi. 2010. VISAGE: A query interface for clinical research. In AMIA Summits on Translational Science Proceedings, Vol. 2010. American Medical Informatics Association, Bethesda, MD 76--80.Google Scholar
- Guo-Qiang Zhang, Wei Zhu, Mengmeng Sun, Shiqiang Tao, Olivier Bodenreider, and Licong Cui. 2014b. MaPLE: A MapReduce pipeline for lattice-based evaluation and its application to SNOMED CT. In IEEE International Conference on Big Data. 754--759.Google ScholarCross Ref
- Songmao Zhang and Olivier Bodenreider. 2006. Law and order: Assessing and enforcing compliance with ontological modeling principles in the Foundational Model of Anatomy. Computers in Biology and Medicine 36, 7 (2006), 674--693.Google ScholarCross Ref
- Xinxin Zhu, Jung-Wei Fan, David M. Baorto, Chunhua Weng, and James J. Cimino. 2009. A review of auditing methods applied to the content of controlled biomedical terminologies. Journal of Biomedical Informatics 42, 3 (2009), 413--425. Google ScholarDigital Library
- Pierre Zweigenbaum, Bruno Bachimont, Jacques Bouaud, Jean Charlet, Jean-François Boisvieux, and others. 1995. Issues in the structuring and acquisition of an ontology for medical language understanding. Methods of Information in Medicine 34 (1995), 15--15.Google ScholarCross Ref
Index Terms
- Biomedical Ontology Quality Assurance Using a Big Data Approach
Recommendations
Query Processing over Large RDF using SPARQL in Big Data
ICTCS '16: Proceedings of the Second International Conference on Information and Communication Technology for Competitive StrategiesInternet search is done by exploring the link graph and keyword frequency. In 2012, Google released "Knowledge Graph" --Semantic Web. The human reasoning can be enhanced by the use semantic web an emerging area. Most of the current applications link ...
Understandable Big Data
This survey presents the concept of Big Data. Firstly, a definition and the features of Big Data are given. Secondly, the different steps for Big Data data processing and the main problems encountered in big data management are described. Next, a ...
Big Data Management: Advanced Issues and Approaches
The objective of this article is to provide the advanced issues and approaches of big data management. The literature review indicates the overview of big data management; the aspects of Big Data Analytics BDA; the importance of big data management; the ...
Comments