Skip to main content

2021 | OriginalPaper | Buchkapitel

Chimera: A Bridge Between Big Data Analytics and Semantic Technologies

verfasst von : Matteo Belcao, Emanuele Falzone, Enea Bionda, Emanuele Della Valle

Erschienen in: The Semantic Web – ISWC 2021

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In the last decades, Knowledge Graph (KG) empowered analytics have been used to extract advanced insights from data. Several companies integrated legacy relational databases with semantic technologies using Ontology-Based Data Access (OBDA). In practice, this approach enables the analysts to write SPARQL queries both over KGs and SQL relational data sources by making transparent most of the implementation details. However, the volume of data is continuously increasing, and a growing number of companies are adopting distributed storage platforms and distributed computing engines. There is a gap between big data and semantic technologies. Ontop, one of the reference OBDA systems, is limited to legacy relational databases, and the compatibility with the big data analytics engine Apache Spark is still missing. This paper introduces Chimera, an open-source software suite that aims at filling such a gap. Chimera enables a new type of round-tripping data science pipelines. Data Scientists can query data stored in a data lake using SPARQL through Ontop and SparkSQL while saving the semantic results of such analysis back in the data lake. This new type of pipelines semantically enriches data from Spark before saving them back.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
5
Chimera supports several Spark versions, starting from 2.4.0 to 3.1.1. Users can change the version by selecting the appropriate image tags.
 
Literatur
1.
Zurück zum Zitat Bionda, E., et al.: The smart grid semantic platform: synergy between iec common information model (cim) and big data. In: 2019 IEEE International Conference on Environment and Electrical Engineering and 2019 IEEE Industrial and Commercial Power Systems Europe (EEEIC/I&CPS Europe). IEEE (2019) Bionda, E., et al.: The smart grid semantic platform: synergy between iec common information model (cim) and big data. In: 2019 IEEE International Conference on Environment and Electrical Engineering and 2019 IEEE Industrial and Commercial Power Systems Europe (EEEIC/I&CPS Europe). IEEE (2019)
2.
Zurück zum Zitat Calvanese, D., et al.: OBDA with the ontop framework. In: SEBD, pp. 296–303. Curran Associates, Inc. (2015) Calvanese, D., et al.: OBDA with the ontop framework. In: SEBD, pp. 296–303. Curran Associates, Inc. (2015)
3.
Zurück zum Zitat Calvanese, D., et al.: Ontop: answering SPARQL queries over relational databases. Semant. Web 8(3), 471–487 (2017)CrossRef Calvanese, D., et al.: Ontop: answering SPARQL queries over relational databases. Semant. Web 8(3), 471–487 (2017)CrossRef
4.
Zurück zum Zitat Calvanese, D., et al.: The MASTRO system for ontology-based data access. Semant. Web 2(1), 43–53 (2011)CrossRef Calvanese, D., et al.: The MASTRO system for ontology-based data access. Semant. Web 2(1), 43–53 (2011)CrossRef
5.
Zurück zum Zitat Chronis, Y., et al.: A relational approach to complex dataflows. In: EDBT/ICDT Workshops. CEUR Workshop Proceedings, vol. 1558. CEUR-WS.org (2016) Chronis, Y., et al.: A relational approach to complex dataflows. In: EDBT/ICDT Workshops. CEUR Workshop Proceedings, vol. 1558. CEUR-WS.org (2016)
6.
Zurück zum Zitat Giese, M., et al.: Optique: zooming in on big data. Computer 48(3), 60–67 (2015)CrossRef Giese, M., et al.: Optique: zooming in on big data. Computer 48(3), 60–67 (2015)CrossRef
8.
Zurück zum Zitat Kharlamov, E., et al.: Ontology based data access in statoil. J. Web Semant. 44, 3–36 (2017)CrossRef Kharlamov, E., et al.: Ontology based data access in statoil. J. Web Semant. 44, 3–36 (2017)CrossRef
9.
Zurück zum Zitat Kharlamov, E., et al.: Semantic access to streaming and static data at siemens. J. Web Semant. 44, 54–74 (2017)CrossRef Kharlamov, E., et al.: Semantic access to streaming and static data at siemens. J. Web Semant. 44, 54–74 (2017)CrossRef
12.
Zurück zum Zitat Noy, N.F., McGuinness, D.L., et al.: Ontology development 101: A guide to creating your first ontology (2001) Noy, N.F., McGuinness, D.L., et al.: Ontology development 101: A guide to creating your first ontology (2001)
13.
Zurück zum Zitat Priyatna, F., Corcho, Ó., Sequeda, J.F.: Formalisation and experiences of r2rml-based SPARQL to SQL query translation using morph. In: WWW, pp. 479–490. ACM (2014) Priyatna, F., Corcho, Ó., Sequeda, J.F.: Formalisation and experiences of r2rml-based SPARQL to SQL query translation using morph. In: WWW, pp. 479–490. ACM (2014)
14.
Zurück zum Zitat Rohloff, K., Schantz, R.E.: High-performance, massively scalable distributed systems using the mapreduce software framework: the SHARD triple-store. In: PSI EtA, p. 4. ACM (2010) Rohloff, K., Schantz, R.E.: High-performance, massively scalable distributed systems using the mapreduce software framework: the SHARD triple-store. In: PSI EtA, p. 4. ACM (2010)
15.
Zurück zum Zitat Schätzle, A., Przyjaciel-Zablocki, M., Lausen, G.: Pigsparql: mapping SPARQL to pig latin. In: SWIM, p. 4. ACM (2011) Schätzle, A., Przyjaciel-Zablocki, M., Lausen, G.: Pigsparql: mapping SPARQL to pig latin. In: SWIM, p. 4. ACM (2011)
16.
Zurück zum Zitat Schätzle, A., Przyjaciel-Zablocki, M., Skilevic, S., Lausen, G.: S2RDF: RDF querying with SPARQL on spark. Proc. VLDB Endow. 9(10), 804–815 (2016)CrossRef Schätzle, A., Przyjaciel-Zablocki, M., Skilevic, S., Lausen, G.: S2RDF: RDF querying with SPARQL on spark. Proc. VLDB Endow. 9(10), 804–815 (2016)CrossRef
17.
Zurück zum Zitat Sequeda, J.F., Miranker, D.P.: Ultrawrap: SPARQL execution on relational data. J. Web Semant. 22, 19–39 (2013)CrossRef Sequeda, J.F., Miranker, D.P.: Ultrawrap: SPARQL execution on relational data. J. Web Semant. 22, 19–39 (2013)CrossRef
19.
Zurück zum Zitat Uslar, M., Specht, M., Rohjans, S., Trefke, J., González, J.M.: The Common Information Model CIM: IEC 61968/61970 and 62325-A practical introduction to the CIM. Springer Science & Business Media (2012) Uslar, M., Specht, M., Rohjans, S., Trefke, J., González, J.M.: The Common Information Model CIM: IEC 61968/61970 and 62325-A practical introduction to the CIM. Springer Science & Business Media (2012)
20.
Zurück zum Zitat Xiao, G., Calvanese, D., Kontchakov, R., Lembo, D., Poggi, A., Rosati, R., Zakharyaschev, M.: Ontology-based data access: a survey. In: IJCAI, pp. 5511–5519. ijcai.org (2018) Xiao, G., Calvanese, D., Kontchakov, R., Lembo, D., Poggi, A., Rosati, R., Zakharyaschev, M.: Ontology-based data access: a survey. In: IJCAI, pp. 5511–5519. ijcai.org (2018)
21.
Zurück zum Zitat Xiao, G., Ding, L., Cogrel, B., Calvanese, D.: Virtual knowledge graphs: an overview of systems and use cases. Data Intell. 1(3), 201–223 (2019)CrossRef Xiao, G., Ding, L., Cogrel, B., Calvanese, D.: Virtual knowledge graphs: an overview of systems and use cases. Data Intell. 1(3), 201–223 (2019)CrossRef
22.
Zurück zum Zitat Yu, H., Liaw, S., Taggart, J., Khorzoughi, A.R.: Using ontologies to identify patients with diabetes in electronic health records. In: International Semantic Web Conference (Posters & Demos). CEUR Workshop Proceedings, vol. 1035, pp. 77–80. CEUR-WS.org (2013) Yu, H., Liaw, S., Taggart, J., Khorzoughi, A.R.: Using ontologies to identify patients with diabetes in electronic health records. In: International Semantic Web Conference (Posters & Demos). CEUR Workshop Proceedings, vol. 1035, pp. 77–80. CEUR-WS.org (2013)
23.
Zurück zum Zitat Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)CrossRef Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)CrossRef
Metadaten
Titel
Chimera: A Bridge Between Big Data Analytics and Semantic Technologies
verfasst von
Matteo Belcao
Emanuele Falzone
Enea Bionda
Emanuele Della Valle
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-88361-4_27

Premium Partner