Skip to main content

2021 | OriginalPaper | Buchkapitel

Applying Grammar-Based Compression to RDF

verfasst von : Michael Röder, Philip Frerk, Felix Conrads, Axel-Cyrille Ngonga Ngomo

Erschienen in: The Semantic Web

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Data compression for RDF knowledge graphs is used in an increasing number of settings. In parallel to this, several grammar-based graph compression algorithms have been developed to reduce the size of graphs. We port gRePair—a state-of-the-art grammar-based graph compression algorithm—to RDF (named RDFRePair). We compare this promising technique with respect to the compression ratio to the state-of-the-art approaches for RDF compression dubbed HDT, HDT++ and OFR as well as a \(k^2\)-trees-based RDF compression. We run an extensive evaluation on 40 datasets. Our results suggest that RDFRePair achieves significantly better compression ratios and runtimes than gRePair. However, it is outperformed by \(k^2\) trees, which achieve the overall best compression ratio on real-world datasets. This better performance comes at the cost of time, as \(k^2\) trees are clearly outperformed by OFR w.r.t. compression and decompression time. A pairwise Wilcoxon Signed Rank Test suggests that while OFR is significantly more time-efficient than HDT and \(k^2\) trees, there is no significant difference between the compression ratios achieved by \(k^2\) trees and OFR. In addition, we point out future directions for research. All code and datasets are available at https://​github.​com/​dice-group/​GraphCompression​ and https://​hobbitdata.​informatik.​uni-leipzig.​de/​rdfrepair/​evaluation_​datasets/​, respectively.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
2
In the current implementation, we use 32 Bit integers. They can be extended to 64 Bits for very large graphs.
 
3
All experiments were executed on a 64-bit Ubuntu 16.04 machine, an Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30 GHz with 64 CPUs and 128 GB RAM. Only the experiments for WatDiv were executed on a 64-bit Debian machine with 128 CPUs and 1TB RAM.
 
4
The datasets can be found at https://​w3id.​org/​dice-research/​data/​rdfrepair/​evaluation_​datasets/​. For scholarly data (DF0–DF9), we use the rich datasets (see http://​www.​scholarlydata.​org/​dumps/​).
 
6
HDT++ is available at https://​github.​com/​antonioillera/​iHDTpp-src. OFR is not publicly available. However, the authors were so kind to provide us the binaries.
 
7
For a fair comparison, we turned this feature of gRePair in our evaluation off. Otherwise, it couldn’t be used with the HDT dictionary.
 
Literatur
2.
Zurück zum Zitat Álvarez-García, S., Brisaboa, N.R., Fernández, J.D., MartíÂnez-Prieto, M.A.: Compressed k\(^2\)-Triples for Full-In-Memory RDF Engines. In: AMCIS 2011 Proceedings. IEEE (2011) Álvarez-García, S., Brisaboa, N.R., Fernández, J.D., MartíÂnez-Prieto, M.A.: Compressed k\(^2\)-Triples for Full-In-Memory RDF Engines. In: AMCIS 2011 Proceedings. IEEE (2011)
3.
Zurück zum Zitat Beckett, D., Berners-Lee, T., Prud’hommeaux, E., Carothers, G.: RDF 1.1 Turtle. W3C Recommendation, W3C (February 2014) Beckett, D., Berners-Lee, T., Prud’hommeaux, E., Carothers, G.: RDF 1.1 Turtle. W3C Recommendation, W3C (February 2014)
4.
Zurück zum Zitat Berners-Lee, T.: Primer: Getting into RDF & Semantic Web Using N3. Technical Report W3C, (October 2010) Berners-Lee, T.: Primer: Getting into RDF & Semantic Web Using N3. Technical Report W3C, (October 2010)
5.
Zurück zum Zitat Brisaboa, N.R., Ladra, S., Navarro, G.: k 2-trees for compact web graph representation. In: International Symposium on String Processing and Information Retrieval (2009) Brisaboa, N.R., Ladra, S., Navarro, G.: k 2-trees for compact web graph representation. In: International Symposium on String Processing and Information Retrieval (2009)
6.
Zurück zum Zitat Duan, S., Kementsietsidis, A., Srinivas, K., Udrea, O.: Apples and oranges: a comparison of RDF benchmarks and real RDF datasets. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD 2011), pp. 145–156. ACM (2011) Duan, S., Kementsietsidis, A., Srinivas, K., Udrea, O.: Apples and oranges: a comparison of RDF benchmarks and real RDF datasets. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD 2011), pp. 145–156. ACM (2011)
7.
Zurück zum Zitat Fernández, J.D., Gutierrez, C., Martínez-Prieto, M.A.: RDF compression: basic approaches. In: Proceedings of the 19th International Conference on World Wide Web (2010) Fernández, J.D., Gutierrez, C., Martínez-Prieto, M.A.: RDF compression: basic approaches. In: Proceedings of the 19th International Conference on World Wide Web (2010)
8.
Zurück zum Zitat Fernández, J.D., Gutierrez, C., Martínez-Prieto, M.A, Polleres, A., Arias, M.: Binary RDF Representation for Publication and Exchange (HDT). Web Semant. Sci. Serv. Agents World Wide Web 19, 22–41 (2013) Fernández, J.D., Gutierrez, C., Martínez-Prieto, M.A, Polleres, A., Arias, M.: Binary RDF Representation for Publication and Exchange (HDT). Web Semant. Sci. Serv. Agents World Wide Web 19, 22–41 (2013)
9.
Zurück zum Zitat Gayathri, V., Kumar, P.S.: Horn-rule based compression technique for RDF data. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing (SAC 2015), pp. 396–401. Association for Computing Machinery, New York (2015) Gayathri, V., Kumar, P.S.: Horn-rule based compression technique for RDF data. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing (SAC 2015), pp. 396–401. Association for Computing Machinery, New York (2015)
10.
Zurück zum Zitat Hernández-Illera, A., Martínez-Prieto, M.A., Fernández, J.D.: Serializing RDF in compressed space. In: 2015 Data Compression Conference, pp. 363–372. IEEE (2015) Hernández-Illera, A., Martínez-Prieto, M.A., Fernández, J.D.: Serializing RDF in compressed space. In: 2015 Data Compression Conference, pp. 363–372. IEEE (2015)
11.
Zurück zum Zitat Hitzler, P., Krötzsch, M., Rudolph, S., Sure, Y.: Semantic Web: Grundlagen. Springer (2007). 10.1007/978-3-319-93417-4 Hitzler, P., Krötzsch, M., Rudolph, S., Sure, Y.: Semantic Web: Grundlagen. Springer (2007). 10.1007/978-3-319-93417-4
13.
Zurück zum Zitat Maneth, S., Peternek, F.: Grammar-based graph compression. Inf. Syst. 76, 19–45 (2018)CrossRef Maneth, S., Peternek, F.: Grammar-based graph compression. Inf. Syst. 76, 19–45 (2018)CrossRef
14.
Zurück zum Zitat Martínez-Prieto, M.A., Fernández, J.D., Cánovas, R.: Compression of RDF dictionaries. In: Proceedings of the 27th Annual ACM Symposium on Applied Computing (SAC 2012), pp. 340–347. Association for Computing Machinery, New York (2012). https://doi.org/10.1145/2245276.2245343 Martínez-Prieto, M.A., Fernández, J.D., Cánovas, R.: Compression of RDF dictionaries. In: Proceedings of the 27th Annual ACM Symposium on Applied Computing (SAC 2012), pp. 340–347. Association for Computing Machinery, New York (2012). https://​doi.​org/​10.​1145/​2245276.​2245343
16.
Zurück zum Zitat Salomon, D.: Data Compression: The Complete Reference. Springer, New York (2004). 10.1007/b97635 Salomon, D.: Data Compression: The Complete Reference. Springer, New York (2004). 10.1007/b97635
17.
Zurück zum Zitat Swacha, J., Grabowski, S.: OFR: an efficient representation of RDF datasets. In: Languages, Applications and Technologies. pp. 224–235. Springer International Publishing (2015) Swacha, J., Grabowski, S.: OFR: an efficient representation of RDF datasets. In: Languages, Applications and Technologies. pp. 224–235. Springer International Publishing (2015)
18.
Zurück zum Zitat Wang, K., Fu, H., Peng, S., Gong, Y., Gu, J.: A RDF data compress model based on octree structure. In: 2017 12th IEEE Conference on Industrial Electronics and Applications (ICIEA). pp. 990–994 (2017) Wang, K., Fu, H., Peng, S., Gong, Y., Gu, J.: A RDF data compress model based on octree structure. In: 2017 12th IEEE Conference on Industrial Electronics and Applications (ICIEA). pp. 990–994 (2017)
Metadaten
Titel
Applying Grammar-Based Compression to RDF
verfasst von
Michael Röder
Philip Frerk
Felix Conrads
Axel-Cyrille Ngonga Ngomo
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-77385-4_6