Abstract
Big biomedical data has grown exponentially during the last decades and a similar growth rate is expected in the next years. Likewise, semantic web technologies have also advanced during the last years, and a great variety of tools, e.g., ontologies and query languages, have been developed by different scientific communities and practitioners. Although a rich variety of tools and big data collections are available, many challenges need to be addressed in order to discover insights from which decisions can be taken. For instance, different interoperability conflicts can exist among data collections, data may be incomplete, and entities may be dispersed across different datasets. These issues hinder knowledge exploration and discovery, being thus required data integration in order to unveil meaningful outcomes. In this chapter, we address these challenges and devise a knowledge-driven framework that relies on semantic web technologies to enable knowledge exploration and discovery. The framework receives big data sources and integrates them into a knowledge graph. Semantic data integration methods are utilized for identifying equivalent entities, i.e., entities that correspond to the same real-world elements. Fusion policies enable the merging of equivalent entities inside the knowledge graph, as well as with entities in other knowledge graphs, e.g., DBpedia and Bio2RFD. Knowledge discovery allows for the exploration of knowledge graphs in order to uncover novel patterns and relations. As proof of concept, we report on the results of applying the knowledge-driven framework in the EU funded project iASiS (http://project-iasis.eu/) in order to transform big data into actionable knowledge, paving thus the way for personalised medicine.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Web services that enable the execution of SPARQL queries following the SPARQL protocol.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
The ten knowledge graphs have 133,873,127 RDF triples.
References
Schmidlen, T.J., Wawak, L., Kasper, R., García-España, J.F., Christman, M.F., Gordon, E.S.: Personalized genomic results: analysis of informational needs. J. Genetic Counseling 578–587 (2014)
Shah, N.H., LePendu, P., Bauer-Mehren, A., Ghebremariam, Y.T., Iyer, S.V., Marcus, J., Nead, K.T., Cooke, J.P., Leeper, N.J.: Proton pump inhibitor usage and the risk of myocardial infarction in the general population. PLoS One (2015)
Iturria-Medina, Y., Sotero, R., Toussaint, P.: Early role of vascular dysregulation on late-onset Alzheimer’s disease based on multifactorial data-driven analysis. Nature Commun. (2016)
Acosta, M., Vidal, M.E., Lampo, T., Castillo, J., Ruckhaus, E.: ANAPSID: an adaptive query processing engine for SPARQL endpoints. In: Proceedings of the 10th International Conference on the Semantic Web ISWC (2011)
Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: FedX: optimization techniques for federated query processing on linked data. In: Proceedings of the 10th International Conference on the Semantic Web ISWC (2011)
Collarana, D., Galkin, M., Traverso-Ribón, I., Vidal, M.E., Lange, C., Auer, S.: MINTE: semantically integrating RDF graphs. In: Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics WIMS (2017)
Collarana, D., Lange, C., Auer, S.: FuhSen: a platform for federated, RDF-based hybrid search. In: Proceedings of the 25th International Conference on World Wide Web (2016)
Knoblock, C.A., Szekely, P., Ambite, J.L., Goel, A., Gupta, S., Lerman, K., Muslea, M., Taheriyan, M., Mallick, P.: Semi-automatically mapping structured sources into the semantic web. In: Proceedings of the 9th Extended Semantic Web Conference ESWC (2012)
Collarana, D., Galkin, M., Lange, C., Scerri, S., Auer, S., Vidal, M.E.: Synthesizing Knowledge Graphs from Web Sources with the MINTE + Framework (2018)
Gawriljuk, G., Harth, A., Knoblock, C.A., Szekely, P.: A scalable approach to incrementally building knowledge graphs. In International Conference on Theory and Practice of Digital Libraries TPDL, pp. 188–199 (2016)
Kejriwal, M., Szekely, P. and Knoblock, C.: Investigative knowledge discovery for combating illicit activities. IEEE Intell. Syst. 53–63 (2018)
Fundulaki, I., Auer, S.: Linked Open Data—Introduction to the Special Theme. ERCIM News (2014)
Stephens, Z.D., Lee, S.Y., Faghri, F., Campbell, R.H., Zhai, C., Efron, M.J., Iyer, R., Schatz, M.C., Sinha, S., Robinson, G.E.: Big Data: astronomical or genomical. PLoS One (2015)
Chen, M., Mao, S., Liu, Y.: Big Data: a survey. MONET 171–209 (2014)
Dimou, A., Vander Sande, M., Colpaert, P., Verborgh, R., Mannens, E., Van de Walle, R.: RML: a generic language for integrated rdf mappings of heterogeneous data. In: Proceedings of the Workshop on Linked Data on the Web co-located with the 23rd International World Wide Web Conference (WWW) (2014)
Wiederhold, G.: Mediators in the architecture of future information systems. IEEE Comput. 38–49 (1992)
Zadorozhny, V., Raschid, L., Vidal, M.E., Urhan, T., Bright, L.: Efficient evaluation of queries in a mediator for WebSources. In: Proceedings of the 2002 {ACM} {SIGMOD} International Conference on Management of Data (2002)
Cao, L.: Data science: challenges and directions. Commun. ACM, 59–68 (2017)
Sivarajah, U., Kamal, M.M., Irani, Z., Weerakkody, V.: Critical analysis of Big Data challenges and analytical methods. J. Business Res. 263–286 (2017)
Jagadish, H.V., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J.M., Ramakrishnan, R., Shahabi, C.: Big data and its technical challenges. Commun. ACM 86–94 (2014)
Knoblock, C.A., Szekely, P., Ambite, J.L., Goel, A., Gupta, S., Lerman, K., Muslea, M., Taheriyan, M., Mallick, P.: Semi-automatically mapping structured sources into the semantic web. In: de Extended Semantic Web Conference (2012)
Collarana, D., Galkin, M., Traverso-Ribón, I., Vidal, M.E., Lange, C., Auer, S.: MINTE: semantically integrating RDF graphs. In: Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics (2017)
Isele, R., Bizer, C.: Active learning of expressive linkage rules using genetic programming. Web Semantics: Science, Services and Agents on the World Wide Web, pp. 2–15 (2013)
Galkin, M., Collarana, D., Traverso-Ribón, I., Vidal, M.E., Auer, S.: SJoin: a semantic join operator to integrate heterogeneous RDF graphs. In: de International Conference on Database and Expert Systems Applications (2017)
Schultz, A., Matteini, A., Isele, R., Mendes, P.N., Bizer, C., Becker, C.: LDIF-a framework for large-scale linked data integration. In: 21st International World Wide Web Conference (WWW 2012), Developers Track, Lyon, France (2012)
Mendes, P.N., Mühleisen, H., Bizer, C.: Sieve: linked data quality assessment and fusion. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops (2012)
Ngomo, A.C.N., Auer, S.: Limes-a time-efficient approach for large-scale link discovery on the web of data. de IJCAI (2011)
Ristoski, P., Bizer, C., Paulheim, H.: Mining the web of linked data with rapidminer. Web Semantics: Science, Services and Agents on the World Wide Web, pp. 142–151 (2015)
Hu, W., Qiu, H., Huang, J., Dumontier, M.: BioSearch: a semantic search engine for Bio2RDF. Database (2017)
Hu, W., Qiu, H., Dumontier, M.: Link analysis of life science linked data. In: de International Semantic Web Conference (2015)
Callahan, A., Cruz-Toledo, J., Ansell, P., Dumontier, M.: Bio2RDF release 2: improved coverage, interoperability and provenance of life science linked data. In; de Extended Semantic Web Conference (2013)
Sahu, S., Mhedhbi, A., Salihoglu, S., Lin, J., Özsu, M.T.: The ubiquity of large graphs and surprising challenges of graph processing. In: Proceedings of the VLDB Endowment, pp. 420–431 (2017)
Hartig, O., Vidal, M.E., Freytag, J.C.: Federated Semantic Data Management (Dagstuhl Seminar 17262), Dagstuhl Reports, pp. 135–167 (2017)
Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: Fedx: optimization techniques for federated query processing on linked data. de International Semantic Web Conference (2011)
Acosta, M., Vidal, M.E., Lampo, T., Castillo, J., Ruckhaus, E.: ANAPSID: an adaptive query processing engine for SPARQL endpoints. In: International Semantic Web Conference (2011)
Endris, K.M., Galkin, M., Lytra, I., Mami, M.N., Vidal, M.E., Auer, S.: MULDER: querying the linked data web by bridging RDF molecule templates. In: International Conference on Database and Expert Systems Applications (2017)
Colombo, P., Ferrari, E.: Privacy aware access control for Big Data: a research roadmap. Big Data Res. 145–154 (2015)
Kirrane, S., Villata, S., d’Aquin, M.: Privacy, security and policies: a review of problems and solutions with semantic web technologies. Semantic Web 1–10 (2018)
Kamateri, E., Kalampokis, E., Tambouris, E., Tarabanis, K.: The linked medical data access control framework. J. Biomed. Informat. 213–225 (2014)
Grando, A., Schwab, R.: Building and evaluating an ontology-based tool for reasoning about consent permission. In: de AMIA Annual Symposium Proceedings (2013)
Zeng, Q., Zhao, M., Liu, P., Yadav, P., Calo, S., Lobo, J.: Enforcement of autonomous authorizations in collaborative distributed query evaluation. IEEE Trans. Knowl. Data Eng. (2015)
Endris, K.M., Almhithawi, Z., Lytra, I., Vidal, M.E., Auer, S.: BOUNCER: privacy-aware query processing over federations of RDF datasets. In: 29th International Conference on Database and Expert Systems Applications (2018)
Ribón, I.T., Vidal, M.-E., Kämpgen, B., Sure-Vetter, Y.: GADES: a graph-based semantic similarity measure. In: Proceedings of the 12th International Conference on Semantic Systems, Leipzig, Germany (2016)
Menasalvas, E., Rodríguez, A., Costumero, R., Ambit, H., Gonzalo, C.: “Clinical Narrative Analytics Challenges”, in Rough Sets—International Joint Conference. IJCRS, Santiago de Chile (2016)
Toro, C., Gonzalo-Martín, C., García-Pedrero, A., Menasalvas Ruiz, E.: Supervoxels-based histon as a new Alzheimer’s disease imaging biomarker. Sensors 1752 (2018)
Livi, C.M., Klus, P., Delli Ponti, R., Tartaglia, G.G.: catRAPID signature: identification of ribonucleoproteins and RNA-binding regions. Bioinformatics 773–775 (2016)
La Cruz, A., Baranya, A., Vidal, M.-E.: Medical image rendering and description driven by semantic annotations. In: Resource Discovery—5th International Workshop, {RED} 2012, Co-located with the 9th Extended Semantic Web Conference, {ESWC} 2012, Heraklion, Greece, May 27, 2012, Heraklion (2012)
Pérez, W., Tello, A., Saquicela, V., Vidal, M.E., La Cruz, A.: An automatic method for the enrichment of {DICOM} metadata using biomedical. In: Proceedings of the 37th Annual International Conference of the {IEEE} Engineering in Medicine and Biology Society, {EMBC} 2015, Milan, Italy, August 25–29, 2015, Milan (2015)
Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: 9th International Conference on Semantic Systems of I-SEMANTICS 2013, ISEM ‘13, Graz, Austria, September 4–6, 2013, Graz (2013)
Ferragina, P., Scaiella, U.: TAGME: on-the-fly annotation of short text fragments (by wikipedia). In: Proceedings of the 19th {ACM} Conference on Information and Knowledge Management, {CIKM} 2010, Toronto, Ontario, Canada, October 26–30, 2010, Toronto (2010)
Hasnain, A., Mehmood, Q., Sana e Zainab, S., Saleem, M., Warren, C., Zehra, D., Decker, S., Rebholz-Schuhmann, D.: BioFed: federated query processing over life sciences linked open data. J. Biomed. Semant. 13 (2017)
Palma, G., Vidal, M.-E., Raschid, L.: Drug-target interaction prediction using semantic similarity and edge partitioning. In: 13th International Semantic Web Conference on the Semantic Web–{ISWC} 2014, Riva del Garda, Italy, October 19-23, 2014. Proceedings, Part I, Riva del Garda (2014)
Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Scientif. Comput. (1998)
Ribon, I.T., Vidal, M.E.: GARUM: a semantic similarity measure based on machine learning and entity characteristics. In: 29th International Conference on Database and Expert Systems Applications, DEXA (2018)
Morales, C., Collarana, D., Vidal, M.E., Auer, S.: MateTee: A semantic similarity metric based on translation embeddings for knowledge graphs. In: 17th International Conference on Web Engineering, ICWE (2017)
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: 27th Annual Conference on Neural Information Processing Systems on Advances in Neural Information Processing Systems 26 (2013)
Nickel, M., Rosasco, L., Poggio, T.A.: Holographic embeddings of knowledge graphs. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (2016)
Nickel, M., Tresp, V.: Tensor factorization for multi-relational learning. In: European Conference of Machine Learning and Knowledge Discovery in Databases, ECML PKDD (2013)
Acknowledgements
This work has been partially funded in by the European Union’s Horizon 2020 research and innovation programme project iASiS under grant agreement No. 727658. Kemele Endris has been sponsored by the EU Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 642795 (WDAqua). Farah Karin has been supported by a scholarship of German Academic Exchange Service (DAAD).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Vidal, ME., Endris, K.M., Jozashoori, S., Karim, F., Palma, G. (2019). Semantic Data Integration of Big Biomedical Data for Supporting Personalised Medicine. In: Alor-Hernández, G., Sánchez-Cervantes, J., Rodríguez-González, A., Valencia-García, R. (eds) Current Trends in Semantic Web Technologies: Theory and Practice. Studies in Computational Intelligence, vol 815. Springer, Cham. https://doi.org/10.1007/978-3-030-06149-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-06149-4_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-06148-7
Online ISBN: 978-3-030-06149-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)