Skip to main content

2017 | OriginalPaper | Buchkapitel

Speeding up Publication of Linked Data Using Data Chunking in LinkedPipes ETL

verfasst von : Jakub Klímek, Petr Škoda

Erschienen in: On the Move to Meaningful Internet Systems. OTM 2017 Conferences

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

There is a multitude of tools for preparation of Linked Data from data sources such as CSV and XML files. These tools usually perform as expected when processing examples, or smaller real world data. However, a majority of these tools become hard to use when faced with a larger dataset such as hundreds of megabytes large CSV file. Tools which load the entire resulting RDF dataset into memory usually have memory requirements unsatisfiable by commodity hardware. This is the case of RDF-based ETL tools. Their limits can be avoided by running them on powerful and expensive hardware, which is, however, not an option for majority of data publishers. Tools which process the data in a streamed way tend to have limited transformation options. This is the case of text-based transformations, such as XSLT, or per-item SPARQL transformations such as the streamed version of TARQL. In this paper, we show how the power and transformation options of RDF-based ETL tools can be combined with the possibility to transform large datasets on common consumer hardware for so called chunkable data - data which can be split in a certain way. We demonstrate our approach in our RDF-based ETL tool, LinkedPipes ETL. We include experiments on selected real world datasets and a comparison of performance and memory consumption of available tools.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Calbimonte, J.-P., Aberer, K.: Reactive processing of RDF streams of events. In: Gandon, F., Guéret, C., Villata, S., Breslin, J., Faron-Zucker, C., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9341, pp. 457–468. Springer, Cham (2015). doi:10.1007/978-3-319-25639-9_56 CrossRef Calbimonte, J.-P., Aberer, K.: Reactive processing of RDF streams of events. In: Gandon, F., Guéret, C., Villata, S., Breslin, J., Faron-Zucker, C., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9341, pp. 457–468. Springer, Cham (2015). doi:10.​1007/​978-3-319-25639-9_​56 CrossRef
2.
Zurück zum Zitat Corcoglioniti, F., Aprosio, A.P., Rospocher, M.: Demonstrating the power of streaming and sorting for non-distributed RDF processing: RDFpro. In: Proceedings of the ISWC 2015 Posters & Demonstrations Track Co-located with the 14th International Semantic Web Conference (ISWC 2015), vol. 1486. CEUR Workshop Proceedings, Bethlehem, PA, USA, 11 October 2015. CEUR-WS.org (2015) Corcoglioniti, F., Aprosio, A.P., Rospocher, M.: Demonstrating the power of streaming and sorting for non-distributed RDF processing: RDFpro. In: Proceedings of the ISWC 2015 Posters & Demonstrations Track Co-located with the 14th International Semantic Web Conference (ISWC 2015), vol. 1486. CEUR Workshop Proceedings, Bethlehem, PA, USA, 11 October 2015. CEUR-WS.org (2015)
3.
Zurück zum Zitat Giménez-Garcia, J.M., Fernández, J.D., Martínez-Prieto, M.A.: MapReduce-based solutions for scalable SPARQL querying. Open J. Semant. Web (OJSW) 1(1), 1–18 (2014) Giménez-Garcia, J.M., Fernández, J.D., Martínez-Prieto, M.A.: MapReduce-based solutions for scalable SPARQL querying. Open J. Semant. Web (OJSW) 1(1), 1–18 (2014)
4.
Zurück zum Zitat Gschwend, A., Neuroni, A.C., Gehrig, T., Combettoo, M.: Publication and reuse of linked data: the fusepool publish-process-perform platform for linked data. Innov. Public Sect. 22, 116–123 (2015) Gschwend, A., Neuroni, A.C., Gehrig, T., Combettoo, M.: Publication and reuse of linked data: the fusepool publish-process-perform platform for linked data. Innov. Public Sect. 22, 116–123 (2015)
5.
Zurück zum Zitat Klímek, J., Škoda, P., Nečaský, M.: LinkedPipes ETL: evolved linked data preparation. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenić, D., Auer, S., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9989, pp. 95–100. Springer, Cham (2016). doi:10.1007/978-3-319-47602-5_20 CrossRef Klímek, J., Škoda, P., Nečaský, M.: LinkedPipes ETL: evolved linked data preparation. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenić, D., Auer, S., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9989, pp. 95–100. Springer, Cham (2016). doi:10.​1007/​978-3-319-47602-5_​20 CrossRef
7.
Zurück zum Zitat Knoblock, C.A., Szekely, P., Ambite, J.L., Goel, A., Gupta, S., Lerman, K., Muslea, M., Taheriyan, M., Mallick, P.: Semi-automatically mapping structured sources into the semantic web. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 375–390. Springer, Heidelberg (2012). doi:10.1007/978-3-642-30284-8_32 CrossRef Knoblock, C.A., Szekely, P., Ambite, J.L., Goel, A., Gupta, S., Lerman, K., Muslea, M., Taheriyan, M., Mallick, P.: Semi-automatically mapping structured sources into the semantic web. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 375–390. Springer, Heidelberg (2012). doi:10.​1007/​978-3-642-30284-8_​32 CrossRef
8.
Zurück zum Zitat Le-Phuoc, D., Polleres, A., Hauswirth, M., Tummarello, G., Morbidoni, C.: Rapid prototyping of semantic mash-ups through semantic web pipes. In: Proceedings of the 18th International Conference on World Wide Web, WWW 2009, pp. 581–590. ACM, New York (2009) Le-Phuoc, D., Polleres, A., Hauswirth, M., Tummarello, G., Morbidoni, C.: Rapid prototyping of semantic mash-ups through semantic web pipes. In: Proceedings of the 18th International Conference on World Wide Web, WWW 2009, pp. 581–590. ACM, New York (2009)
9.
Zurück zum Zitat Marx, E., Shekarpour, S., Auer, S., Ngomo, A.-C.N.: Large-scale RDF dataset slicing. In: Proceedings of the 2013 IEEE Seventh International Conference on Semantic Computing, ICSC 2013, pp. 228–235. IEEE Computer Society, Washington, DC (2013) Marx, E., Shekarpour, S., Auer, S., Ngomo, A.-C.N.: Large-scale RDF dataset slicing. In: Proceedings of the 2013 IEEE Seventh International Conference on Semantic Computing, ICSC 2013, pp. 228–235. IEEE Computer Society, Washington, DC (2013)
10.
Zurück zum Zitat De Meester, B., Maroy, W., Dimou, A., Verborgh, R., Mannens, E.: Declarative data transformations for linked data generation: the case of DBpedia. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) ESWC 2017. LNCS, vol. 10250, pp. 33–48. Springer, Cham (2017). doi:10.1007/978-3-319-58451-5_3 CrossRef De Meester, B., Maroy, W., Dimou, A., Verborgh, R., Mannens, E.: Declarative data transformations for linked data generation: the case of DBpedia. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) ESWC 2017. LNCS, vol. 10250, pp. 33–48. Springer, Cham (2017). doi:10.​1007/​978-3-319-58451-5_​3 CrossRef
11.
Zurück zum Zitat Scharffe, F., Atemezing, G., Troncy, R., Gandon, F., Villata, S., Bucher, B., Hamdi, F., Bihanic, L., Képéklian, G., Cotton, F., Euzenat, J., Fan, Z., Vandenbussche, P.-Y., Vatant, B.: Enabling linked data publication with the Datalift platform. In: Proceedings of AAAI Workshop on Semantic Cities, Toronto, Canada, July 2012 Scharffe, F., Atemezing, G., Troncy, R., Gandon, F., Villata, S., Bucher, B., Hamdi, F., Bihanic, L., Képéklian, G., Cotton, F., Euzenat, J., Fan, Z., Vandenbussche, P.-Y., Vatant, B.: Enabling linked data publication with the Datalift platform. In: Proceedings of AAAI Workshop on Semantic Cities, Toronto, Canada, July 2012
12.
Zurück zum Zitat Thellmann, K., Orlandi, F., Auer, S.: LinDA - visualising and exploring linked data. In: Proceedings of the Posters and Demos Track of 10th International Conference on Semantic Systems - SEMANTiCS 2014, Leipzig, Germany, September 2014 Thellmann, K., Orlandi, F., Auer, S.: LinDA - visualising and exploring linked data. In: Proceedings of the Posters and Demos Track of 10th International Conference on Semantic Systems - SEMANTiCS 2014, Leipzig, Germany, September 2014
Metadaten
Titel
Speeding up Publication of Linked Data Using Data Chunking in LinkedPipes ETL
verfasst von
Jakub Klímek
Petr Škoda
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-69459-7_10