Skip to main content
Erschienen in: The VLDB Journal 2/2019

03.12.2018 | Regular Paper

Building self-clustering RDF databases using Tunable-LSH

verfasst von: Güneş Aluç, M. Tamer Özsu, Khuzaima Daudjee

Erschienen in: The VLDB Journal | Ausgabe 2/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The Resource Description Framework (RDF) is a W3C standard for representing graph-structured data, and SPARQL is the standard query language for RDF. Recent advances in information extraction, linked data management and the Semantic Web have led to a rapid increase in both the volume and the variety of RDF data that are publicly available. As businesses start to capitalize on RDF data, RDF data management systems are being exposed to workloads that are far more diverse and dynamic than what they were designed to handle. Consequently, there is a growing need for developing workload-adaptive and self-tuning RDF data management systems. To realize this objective, we introduce a fast and efficient method for dynamically clustering records in an RDF data management system. Specifically, we assume nothing about the workload upfront, but as SPARQL queries are executed, we keep track of records that are co-accessed by the queries in the workload and physically cluster them. To decide dynamically and in constant-time where a record needs to be placed in the storage system, we develop a new locality-sensitive hashing (LSH) scheme, Tunable-LSH. Using Tunable-LSH, records that are co-accessed across similar sets of queries can be hashed to the same or nearby physical pages in the storage system. What sets Tunable-LSH apart from existing LSH schemes is that it can auto-tune to achieve the aforementioned clustering objective with high accuracy even when the workloads change. Experimental evaluation of Tunable-LSH in an RDF data management system as well as in a standalone hashtable shows end-to-end performance gains over existing solutions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The Hamming distance between two record utilization vectors is equal to their edit distance [56], as well as the Manhattan distance [54] between these two vectors in \(l_{1}\) norm.
 
2
This uniformity condition simplifies the sensitivity analysis of Tunable-LSH, but it is not a requirement from an algorithmic point of view. Relaxing this condition is left as future work.
 
3
Groups are separated by vertical dashed lines.
 
4
In practice, this translation is not required because the system maintains positional vectors instead.
 
Literatur
1.
Zurück zum Zitat Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: SW-Store: a vertically partitioned DBMS for semantic web data management. VLDB J. 18, 385–406 (2009)CrossRef Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: SW-Store: a vertically partitioned DBMS for semantic web data management. VLDB J. 18, 385–406 (2009)CrossRef
2.
Zurück zum Zitat Aggarwal, C.C.: A survey of stream clustering algorithms. In: Aggarwal, C.C., Reddy, C.K. (eds.) Data Clustering: Algorithms and Applications, pp. 231–258. CRC Press, Boca Raton, Florida (2013)CrossRef Aggarwal, C.C.: A survey of stream clustering algorithms. In: Aggarwal, C.C., Reddy, C.K. (eds.) Data Clustering: Algorithms and Applications, pp. 231–258. CRC Press, Boca Raton, Florida (2013)CrossRef
3.
Zurück zum Zitat Agrawal, S., Chaudhuri, S., Narasayya, V.R.: Automated selection of materialized views and indexes in SQL databases. In: Proceedings of the 26th International Conference on Very Large DataBases, pp. 496–505 (2000) Agrawal, S., Chaudhuri, S., Narasayya, V.R.: Automated selection of materialized views and indexes in SQL databases. In: Proceedings of the 26th International Conference on Very Large DataBases, pp. 496–505 (2000)
4.
Zurück zum Zitat Ailamaki, A., DeWitt, D.J., Hill, M.D., Wood, D.A.: DBMSs on a modern processor: where does time go? In: Proceedings of the 25th International Conference on Very Large DataBases, pp. 266–277 (1999) Ailamaki, A., DeWitt, D.J., Hill, M.D., Wood, D.A.: DBMSs on a modern processor: where does time go? In: Proceedings of the 25th International Conference on Very Large DataBases, pp. 266–277 (1999)
5.
Zurück zum Zitat Al-Harbi, R., Abdelaziz, I., Kalnis, P., Mamoulis, N., Ebrahim, Y., Sahli, M.: Accelerating SPARQL queries by exploiting hash-based locality and adaptive partitioning. VLDB J. 25(3), 355–380 (2016)CrossRef Al-Harbi, R., Abdelaziz, I., Kalnis, P., Mamoulis, N., Ebrahim, Y., Sahli, M.: Accelerating SPARQL queries by exploiting hash-based locality and adaptive partitioning. VLDB J. 25(3), 355–380 (2016)CrossRef
6.
Zurück zum Zitat Al-Harbi, R., Ebrahim, Y., Kalnis, P.: Phd-store: an adaptive SPARQL engine with dynamic partitioning for distributed RDF repositories. CoRR, arXiv:1405.4979 (2014) Al-Harbi, R., Ebrahim, Y., Kalnis, P.: Phd-store: an adaptive SPARQL engine with dynamic partitioning for distributed RDF repositories. CoRR, arXiv:​1405.​4979 (2014)
8.
Zurück zum Zitat Aluç, G., DeHaan, D., Bowman, I.T.: Parametric plan caching using density-based clustering. In: Proceedings of the 28th International Conference on Data Engineering, pp. 402–413 (2012) Aluç, G., DeHaan, D., Bowman, I.T.: Parametric plan caching using density-based clustering. In: Proceedings of the 28th International Conference on Data Engineering, pp. 402–413 (2012)
9.
Zurück zum Zitat Aluç, G., Hartig, O., Özsu, M. T., Daudjee, K.: Diversified stress testing of rdf data management systems. In: Proceedings of the 13th International Semantic Web Conference, pp. 197–212 (2014) Aluç, G., Hartig, O., Özsu, M. T., Daudjee, K.: Diversified stress testing of rdf data management systems. In: Proceedings of the 13th International Semantic Web Conference, pp. 197–212 (2014)
10.
Zurück zum Zitat Aluç, G., Özsu, M.T., Daudjee, K.: Workload matters: why RDF databases need a new design. Proc. VLDB Endow. 7(10), 837–840 (2014)CrossRef Aluç, G., Özsu, M.T., Daudjee, K.: Workload matters: why RDF databases need a new design. Proc. VLDB Endow. 7(10), 837–840 (2014)CrossRef
11.
Zurück zum Zitat Aluç, G., Özsu, M.T., Daudjee, K., Hartig, O.: Chameleon-db: a workload-aware robust RDF data management system. Technical Report CS-2013-10. University of Waterloo (2013) Aluç, G., Özsu, M.T., Daudjee, K., Hartig, O.: Chameleon-db: a workload-aware robust RDF data management system. Technical Report CS-2013-10. University of Waterloo (2013)
12.
Zurück zum Zitat Aluç, G., Özsu, M. T., Daudjee, K., Hartig, O.: Executing queries over schemaless RDF databases. In: Proceedings of the 31st International Conference on Data Engineering, pp. 807–818 (2015) Aluç, G., Özsu, M. T., Daudjee, K., Hartig, O.: Executing queries over schemaless RDF databases. In: Proceedings of the 31st International Conference on Data Engineering, pp. 807–818 (2015)
13.
Zurück zum Zitat Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: Proceedings of the 47th Annual Symposium on Foundations of Computer Science, pp. 459–468 (2006) Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: Proceedings of the 47th Annual Symposium on Foundations of Computer Science, pp. 459–468 (2006)
14.
Zurück zum Zitat Arias, M., Fernández, J.D., Martínez-Prieto, M.A., de la Fuente, P.: An empirical study of real-world SPARQL queries. CoRR, arXiv:1103.5043 (2011) Arias, M., Fernández, J.D., Martínez-Prieto, M.A., de la Fuente, P.: An empirical study of real-world SPARQL queries. CoRR, arXiv:​1103.​5043 (2011)
15.
Zurück zum Zitat Athitsos, V., Potamias, M., Papapetrou, P., Kollios, G.: Nearest neighbor retrieval using distance-based hashing. In: Proceedings of the 24th International Conference on Data Engineering, pp. 327–336 (2008) Athitsos, V., Potamias, M., Papapetrou, P., Kollios, G.: Nearest neighbor retrieval using distance-based hashing. In: Proceedings of the 24th International Conference on Data Engineering, pp. 327–336 (2008)
16.
Zurück zum Zitat Bast, H., Buchhold, B.: Qlever: A query engine for efficient sparql+text search. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 647–656 (2017) Bast, H., Buchhold, B.: Qlever: A query engine for efficient sparql+text search. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 647–656 (2017)
17.
Zurück zum Zitat Bello, R.G., Dias, K., Downing, A., Feenan, J.J., Finnerty, Jr. J.L., Norcott, W.D., Sun, H., Witkowski, A., Ziauddin, M.: Materialized views in oracle. In: Proceedings of the 24th International Conference on Very Large Data Bases, pp. 659–664 (1998) Bello, R.G., Dias, K., Downing, A., Feenan, J.J., Finnerty, Jr. J.L., Norcott, W.D., Sun, H., Witkowski, A., Ziauddin, M.: Materialized views in oracle. In: Proceedings of the 24th International Conference on Very Large Data Bases, pp. 659–664 (1998)
18.
Zurück zum Zitat Berendt, B., Dragan, L., Hollink, L., Luczak-Rösch, M., Demidova, E., Dietze, S., Szymanski, J., Breslin, J.G., editors. In: Joint Proceeding of the the 5th International Workshop on Using the Web in the Age of Data and the 2nd International Workshop on Dataset PROFIling and fEderated Search for Linked Data, Volume 1362 of CEUR Workshop Proceedings. CEUR-WS.org (2015) Berendt, B., Dragan, L., Hollink, L., Luczak-Rösch, M., Demidova, E., Dietze, S., Szymanski, J., Breslin, J.G., editors. In: Joint Proceeding of the the 5th International Workshop on Using the Web in the Age of Data and the 2nd International Workshop on Dataset PROFIling and fEderated Search for Linked Data, Volume 1362 of CEUR Workshop Proceedings. CEUR-WS.org (2015)
20.
Zurück zum Zitat Bislimovska, B., Aluç, G., Özsu, M.T., Fraternali, P.: Graph search of software models using multidimensional scaling. In: Proceedings of the Workshops of the EDBT/ICDT 2015 Joint Conference, pp. 163–170 (2015) Bislimovska, B., Aluç, G., Özsu, M.T., Fraternali, P.: Graph search of software models using multidimensional scaling. In: Proceedings of the Workshops of the EDBT/ICDT 2015 Joint Conference, pp. 163–170 (2015)
21.
Zurück zum Zitat Bornea, M.A., Dolby, J., Kementsietsidis, A., Srinivas, K., Dantressangle, P., Udrea, O., Bhattacharjee, B.: Building an efficient RDF store over a relational database. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 121–132 (2013) Bornea, M.A., Dolby, J., Kementsietsidis, A., Srinivas, K., Dantressangle, P., Udrea, O., Bhattacharjee, B.: Building an efficient RDF store over a relational database. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 121–132 (2013)
22.
Zurück zum Zitat Broder, A.Z.: On the resemblance and containment of documents. In: Proceedings of the Compression and Complexity of Sequences, pp. 21–29 (1997) Broder, A.Z.: On the resemblance and containment of documents. In: Proceedings of the Compression and Complexity of Sequences, pp. 21–29 (1997)
23.
Zurück zum Zitat Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. J. Comput. Syst. Sci. 60(3), 630–659 (2000)MathSciNetCrossRefMATH Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. J. Comput. Syst. Sci. 60(3), 630–659 (2000)MathSciNetCrossRefMATH
24.
Zurück zum Zitat Bruno, N., Chaudhuri, S.: To tune or not to tune? a lightweight physical design alerter. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 499–510 (2006) Bruno, N., Chaudhuri, S.: To tune or not to tune? a lightweight physical design alerter. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 499–510 (2006)
25.
Zurück zum Zitat Carroll, J.J., Dickinson, I., Dollin, C., Reynolds, D., Seaborne, A., Wilkinson, K.: Jena: implementing the semantic web recommendations. In: Proceedings of the 13th International World Wide Web Conference—Alternate Track Papers and Posters, pp. 74–83 (2004) Carroll, J.J., Dickinson, I., Dollin, C., Reynolds, D., Seaborne, A., Wilkinson, K.: Jena: implementing the semantic web recommendations. In: Proceedings of the 13th International World Wide Web Conference—Alternate Track Papers and Posters, pp. 74–83 (2004)
26.
Zurück zum Zitat Ceri, S., Navathe, S.B., Wiederhold, G.: Distribution design of logical database schemas. IEEE Trans. Softw. Eng. 9(4), 487–504 (1983)CrossRefMATH Ceri, S., Navathe, S.B., Wiederhold, G.: Distribution design of logical database schemas. IEEE Trans. Softw. Eng. 9(4), 487–504 (1983)CrossRefMATH
27.
Zurück zum Zitat Charikar, M.: Similarity estimation techniques from rounding algorithms. In: Proceedings of the 34th Annual ACM Symposium on the Theory of Computing, pp. 380–388 (2002) Charikar, M.: Similarity estimation techniques from rounding algorithms. In: Proceedings of the 34th Annual ACM Symposium on the Theory of Computing, pp. 380–388 (2002)
28.
Zurück zum Zitat Chaudhuri, S., Narasayya, V.: Self-tuning database systems: a decade of progress. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 3–14 (2007) Chaudhuri, S., Narasayya, V.: Self-tuning database systems: a decade of progress. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 3–14 (2007)
29.
Zurück zum Zitat Datar, M., Immorlica, N., Indyk, P., Mirrokni. V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the 20th Annual Symposium on Computational Geometry, pp. 253–262 (2004) Datar, M., Immorlica, N., Indyk, P., Mirrokni. V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the 20th Annual Symposium on Computational Geometry, pp. 253–262 (2004)
30.
Zurück zum Zitat Erling, O.: Virtuoso, a hybrid RDBMS/graph column store. IEEE Data Eng. Bull. 35(1), 3–8 (2012) Erling, O.: Virtuoso, a hybrid RDBMS/graph column store. IEEE Data Eng. Bull. 35(1), 3–8 (2012)
31.
Zurück zum Zitat Ferhatosmanoglu, H., Tuncel, E., Agrawal, D., El Abbadi, A.: Approximate nearest neighbor searching in multimedia databases. In: Proceedings 17th International Conference on Data Engineering , pp. 503–511 (2001) Ferhatosmanoglu, H., Tuncel, E., Agrawal, D., El Abbadi, A.: Approximate nearest neighbor searching in multimedia databases. In: Proceedings 17th International Conference on Data Engineering , pp. 503–511 (2001)
32.
Zurück zum Zitat Foley, J.D., van Dam, A., Feiner, S.K., Hughes, J.F.: Computer Graphics: Principles and Practice, 2nd edn. Addison-Wesley Longman Publishing Co. Inc, Boston (1990)MATH Foley, J.D., van Dam, A., Feiner, S.K., Hughes, J.F.: Computer Graphics: Principles and Practice, 2nd edn. Addison-Wesley Longman Publishing Co. Inc, Boston (1990)MATH
33.
Zurück zum Zitat French, K.R., Schwert, G.W., Stambaugh, R.F.: Expected stock returns and volatility. J. Finan. Econ. 19, 3–30 (1987)CrossRef French, K.R., Schwert, G.W., Stambaugh, R.F.: Expected stock returns and volatility. J. Finan. Econ. 19, 3–30 (1987)CrossRef
34.
Zurück zum Zitat Galarraga, L., Hose, K., Schenkel, R.: Partout: a distributed engine for efficient RDF processing. In: Proceedings of the 23rd International World Wide Web Conference, Companion Volume, pp. 267–268 (2014) Galarraga, L., Hose, K., Schenkel, R.: Partout: a distributed engine for efficient RDF processing. In: Proceedings of the 23rd International World Wide Web Conference, Companion Volume, pp. 267–268 (2014)
35.
Zurück zum Zitat Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of 25th International Conference on Very Large Data Bases, pp. 518–529 (1999) Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of 25th International Conference on Very Large Data Bases, pp. 518–529 (1999)
36.
Zurück zum Zitat Goasdoué, F., Karanasos, K., Leblay, J., Manolescu, I.: View selection in semantic web databases. Proc. VLDB Endow. 5(2), 97–108 (2011)CrossRef Goasdoué, F., Karanasos, K., Leblay, J., Manolescu, I.: View selection in semantic web databases. Proc. VLDB Endow. 5(2), 97–108 (2011)CrossRef
37.
Zurück zum Zitat Graefe, G., Idreos, S., Kuno, H.A., Manegold, S.: Benchmarking adaptive indexing. In: Proceedings of the Performance Evaluation, Measurement and Characterization of Complex Systems—2nd TPC Technology Conference TPCTC, pp. 169–184 (2010) Graefe, G., Idreos, S., Kuno, H.A., Manegold, S.: Benchmarking adaptive indexing. In: Proceedings of the Performance Evaluation, Measurement and Characterization of Complex Systems—2nd TPC Technology Conference TPCTC, pp. 169–184 (2010)
38.
Zurück zum Zitat Gurajada, S., Seufert, S., Miliaraki, I., Theobald, M.: Triad: a distributed shared-nothing RDF engine based on asynchronous message passing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 289–300 (2014) Gurajada, S., Seufert, S., Miliaraki, I., Theobald, M.: Triad: a distributed shared-nothing RDF engine based on asynchronous message passing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 289–300 (2014)
39.
Zurück zum Zitat Halim, F., Idreos, S., Karras, P., Yap, R.H.C.: Stochastic database cracking: towards robust adaptive indexing in main-memory column-stores. Proc. VLDB Endow. 5(6), 502–513 (2012)CrossRef Halim, F., Idreos, S., Karras, P., Yap, R.H.C.: Stochastic database cracking: towards robust adaptive indexing in main-memory column-stores. Proc. VLDB Endow. 5(6), 502–513 (2012)CrossRef
40.
Zurück zum Zitat Hamming, R.W. (ed.): Coding and Information Theory. Prentice-Hall, Englewood Cliffs (1986)MATH Hamming, R.W. (ed.): Coding and Information Theory. Prentice-Hall, Englewood Cliffs (1986)MATH
41.
Zurück zum Zitat Harbi, R., Abdelaziz, I., Kalnis, P., Mamoulis, N.: Evaluating SPARQL queries on massive RDF datasets. Proc. VLDB Endow. 8(12), 1848–1851 (2015)CrossRef Harbi, R., Abdelaziz, I., Kalnis, P., Mamoulis, N.: Evaluating SPARQL queries on massive RDF datasets. Proc. VLDB Endow. 8(12), 1848–1851 (2015)CrossRef
42.
Zurück zum Zitat Harris, S., Seaborne, A., Prud’hommeaux. E.: SPARQL 1.1 query language. W3C Recommendation (2013) Harris, S., Seaborne, A., Prud’hommeaux. E.: SPARQL 1.1 query language. W3C Recommendation (2013)
43.
Zurück zum Zitat Harth, A., Umbrich, J., Hogan, A., Decker, S.: Yars2: A federated repository for querying graph structured data from the web. In: Proceedings of the 6th International Semantic Web Conference, pp. 211–224 (2007) Harth, A., Umbrich, J., Hogan, A., Decker, S.: Yars2: A federated repository for querying graph structured data from the web. In: Proceedings of the 6th International Semantic Web Conference, pp. 211–224 (2007)
44.
Zurück zum Zitat He, L., Shao, B., Li, Y., Xia, H., Xiao, Y., Chen, E., Chen, L.: Stylus: a strongly-typed store for serving massive RDF data. Proc. VLDB Endow. 11(2), 203–216 (2017)CrossRef He, L., Shao, B., Li, Y., Xia, H., Xiao, Y., Chen, E., Chen, L.: Stylus: a strongly-typed store for serving massive RDF data. Proc. VLDB Endow. 11(2), 203–216 (2017)CrossRef
45.
Zurück zum Zitat Hose, K., Schenkel, R.: WARP: workload-aware replication and partitioning for RDF. In: Proceedings of Workshops of the 29th IEEE International Conference on Data Engineering, pp. 1–6 (2013) Hose, K., Schenkel, R.: WARP: workload-aware replication and partitioning for RDF. In: Proceedings of Workshops of the 29th IEEE International Conference on Data Engineering, pp. 1–6 (2013)
46.
Zurück zum Zitat Houle, M.E., Sakuma, J.: Fast approximate similarity search in extremely high-dimensional data sets. In: Proceedings of the 21st International Conference on Data Engineering, pp. 619–630 (2005) Houle, M.E., Sakuma, J.: Fast approximate similarity search in extremely high-dimensional data sets. In: Proceedings of the 21st International Conference on Data Engineering, pp. 619–630 (2005)
47.
Zurück zum Zitat Husain, M.F., McGlothlin, J.P., Masud, M.M., Khan, L.R., Thuraisingham, B.M.: Heuristics-based query processing for large RDF graphs using cloud computing. IEEE Trans. Knowl. Data Eng. 23(9), 1312–1327 (2011)CrossRef Husain, M.F., McGlothlin, J.P., Masud, M.M., Khan, L.R., Thuraisingham, B.M.: Heuristics-based query processing for large RDF graphs using cloud computing. IEEE Trans. Knowl. Data Eng. 23(9), 1312–1327 (2011)CrossRef
48.
Zurück zum Zitat Idreos, S., Kersten, M.L., Manegold, S.: Database cracking. In: Proceedings of the 3rd Biennial Conference on Innovative Data Systems Research, pp. 68–78 (2007) Idreos, S., Kersten, M.L., Manegold, S.: Database cracking. In: Proceedings of the 3rd Biennial Conference on Innovative Data Systems Research, pp. 68–78 (2007)
49.
Zurück zum Zitat Idreos, S., Manegold, S., Kuno, H.A., Graefe, G.: Merging what’s cracked, cracking what’s merged: adaptive indexing in main-memory column-stores. PVLDB 4(9), 585–597 (2011) Idreos, S., Manegold, S., Kuno, H.A., Graefe, G.: Merging what’s cracked, cracking what’s merged: adaptive indexing in main-memory column-stores. PVLDB 4(9), 585–597 (2011)
50.
Zurück zum Zitat Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the 30th Annual ACM Symposium on Theory of Computing, pp. 604–613 (1998) Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the 30th Annual ACM Symposium on Theory of Computing, pp. 604–613 (1998)
51.
Zurück zum Zitat Jaccard, P.: The distribution of flora in the alpine zone. New Phytol. 11(2), 37–50 (1912)CrossRef Jaccard, P.: The distribution of flora in the alpine zone. New Phytol. 11(2), 37–50 (1912)CrossRef
52.
Zurück zum Zitat Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31, 264–323 (1999)CrossRef Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31, 264–323 (1999)CrossRef
53.
Zurück zum Zitat Kirchberg, M., Ko, R.K.L., Lee, B.-S.: From linked data to relevant data—time is the essence. CoRR, arXiv:1103.5046 (2011) Kirchberg, M., Ko, R.K.L., Lee, B.-S.: From linked data to relevant data—time is the essence. CoRR, arXiv:​1103.​5046 (2011)
54.
Zurück zum Zitat Krause, E.F. (ed.): Taxicab Geometry: An Adventure in Non-Euclidean Geometry. Dover, New York (1986) Krause, E.F. (ed.): Taxicab Geometry: An Adventure in Non-Euclidean Geometry. Dover, New York (1986)
55.
Zurück zum Zitat Kruskal, J.B.: Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29, 1–27 (1964)MathSciNetCrossRefMATH Kruskal, J.B.: Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29, 1–27 (1964)MathSciNetCrossRefMATH
56.
Zurück zum Zitat Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys. Dokl. 10(8), 707–710 (1966)MathSciNet Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys. Dokl. 10(8), 707–710 (1966)MathSciNet
57.
Zurück zum Zitat Lightstone, S., Teorey, T.J., Nadeau, T.P.: Physical Database Design: the Database Professional’s Guide to Exploiting Indexes, Views, Storage, and More. Morgan Kaufmann, Burlington (2007) Lightstone, S., Teorey, T.J., Nadeau, T.P.: Physical Database Design: the Database Professional’s Guide to Exploiting Indexes, Views, Storage, and More. Morgan Kaufmann, Burlington (2007)
58.
Zurück zum Zitat McGlothlin, J.P., Khan, L.R.: Materializing and persisting inferred and uncertain knowledge in RDF datasets. In: Proceedings of the 24th International Conference on Artificial Intelligence (2010) McGlothlin, J.P., Khan, L.R.: Materializing and persisting inferred and uncertain knowledge in RDF datasets. In: Proceedings of the 24th International Conference on Artificial Intelligence (2010)
59.
Zurück zum Zitat Morrison, A., Ross, G., Chalmers, M.: Fast multidimensional scaling through sampling, springs and interpolation. Inf. Vis. 2(1), 68–77 (2003)CrossRef Morrison, A., Ross, G., Chalmers, M.: Fast multidimensional scaling through sampling, springs and interpolation. Inf. Vis. 2(1), 68–77 (2003)CrossRef
60.
Zurück zum Zitat Morsey, M., Lehmann, J., Auer, S., Ngomo, A.-C.N.: DBpedia SPARQL benchmark—performance assessment with real queries on real data. In: Proceedings of the 10th International Semantic Web Conference, pp. 454–469 (2011) Morsey, M., Lehmann, J., Auer, S., Ngomo, A.-C.N.: DBpedia SPARQL benchmark—performance assessment with real queries on real data. In: Proceedings of the 10th International Semantic Web Conference, pp. 454–469 (2011)
61.
Zurück zum Zitat Morton, G.M.: A computer oriented geodetic data base; and a new technique in file sequencing. Technical report. IBM Ltd., Ottawa, Canada (1966) Morton, G.M.: A computer oriented geodetic data base; and a new technique in file sequencing. Technical report. IBM Ltd., Ottawa, Canada (1966)
62.
Zurück zum Zitat Nah, F.F.-H.: A study on tolerable waiting time: how long are Web users willing to wait? Behav. IT 23(3), 153–163 (2004) Nah, F.F.-H.: A study on tolerable waiting time: how long are Web users willing to wait? Behav. IT 23(3), 153–163 (2004)
63.
Zurück zum Zitat Neumann, T., Weikum, G.: The RDF-3X engine for scalable management of RDF data. VLDB J. 19(1), 91–113 (2010)CrossRef Neumann, T., Weikum, G.: The RDF-3X engine for scalable management of RDF data. VLDB J. 19(1), 91–113 (2010)CrossRef
64.
Zurück zum Zitat Neumann, T., Weikum, G.: x-RDF-3X: fast querying, high update rates, and consistency for RDF databases. Proc. VLDB Endow. 3(1), 256–263 (2010)CrossRef Neumann, T., Weikum, G.: x-RDF-3X: fast querying, high update rates, and consistency for RDF databases. Proc. VLDB Endow. 3(1), 256–263 (2010)CrossRef
65.
Zurück zum Zitat Papailiou, N., Konstantinou, I., Tsoumakos, D., Koziris, N: H2RDF: adaptive query processing on RDF data in the cloud. In: Proceedings of the 21st International World Wide Web Conference Companion Volume, pp. 397–400 (2012) Papailiou, N., Konstantinou, I., Tsoumakos, D., Koziris, N: H2RDF: adaptive query processing on RDF data in the cloud. In: Proceedings of the 21st International World Wide Web Conference Companion Volume, pp. 397–400 (2012)
66.
Zurück zum Zitat Papailiou, N., Tsoumakos, D., Karras, P., Koziris, N.: Graph-aware, workload-adaptive SPARQL query caching. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1777–1792 (2015) Papailiou, N., Tsoumakos, D., Karras, P., Koziris, N.: Graph-aware, workload-adaptive SPARQL query caching. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1777–1792 (2015)
67.
Zurück zum Zitat Reed, W.: The normal-Laplace distribution and its relatives. In: Proceedings of the Advances in Distribution Theory, Order Statistics, and Inference, pp. 61–74 (2006) Reed, W.: The normal-Laplace distribution and its relatives. In: Proceedings of the Advances in Distribution Theory, Order Statistics, and Inference, pp. 61–74 (2006)
68.
Zurück zum Zitat Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International World Wide Web Conference, pp 851–860 (2010) Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International World Wide Web Conference, pp 851–860 (2010)
69.
Zurück zum Zitat Sidirourgos, L., Goncalves, R., Kersten, M., Nes, N., Manegold, S.: Column-store support for RDF data management: not all swans are white. Proc. VLDB Endow. 1(2), 1553–1563 (2008)CrossRef Sidirourgos, L., Goncalves, R., Kersten, M., Nes, N., Manegold, S.: Column-store support for RDF data management: not all swans are white. Proc. VLDB Endow. 1(2), 1553–1563 (2008)CrossRef
73.
Zurück zum Zitat Tao, Y., Yi, K., Sheng, K., Kalnis, P.: Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Trans. Database Syst. 35(3), 20 (2010)CrossRef Tao, Y., Yi, K., Sheng, K., Kalnis, P.: Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Trans. Database Syst. 35(3), 20 (2010)CrossRef
74.
Zurück zum Zitat Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endow. 1(1), 1008–1019 (2008)CrossRef Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endow. 1(1), 1008–1019 (2008)CrossRef
75.
Zurück zum Zitat Wilkinson, K.: Jena property table implementation. Technical Report HPL-2006-140, HP-Labs (2006) Wilkinson, K.: Jena property table implementation. Technical Report HPL-2006-140, HP-Labs (2006)
76.
Zurück zum Zitat Yuan, P., Liu, P., Wu, B., Jin, H., Zhang, W., Liu, L.: TripleBit: a fast and compact system for large scale RDF data. Proc. VLDB Endow. 6(7), 517–528 (2013)CrossRef Yuan, P., Liu, P., Wu, B., Jin, H., Zhang, W., Liu, L.: TripleBit: a fast and compact system for large scale RDF data. Proc. VLDB Endow. 6(7), 517–528 (2013)CrossRef
77.
Zurück zum Zitat Zeng, L., Zou, L.: Redesign of the gStore system. Front. Comput. Sci. 12(4), 623–641 (2018)CrossRef Zeng, L., Zou, L.: Redesign of the gStore system. Front. Comput. Sci. 12(4), 623–641 (2018)CrossRef
78.
Zurück zum Zitat Zilio, D.C., Rao, J., Lightstone, S., Lohman, G. M., Storm, A.J., Garcia-Arellano, C., Fadden, S.: DB2 design advisor: integrated automatic physical database design. In: Proceedings of the 30th International Conference on Very Large Data Bases, pp. 1087–1097 (2004) Zilio, D.C., Rao, J., Lightstone, S., Lohman, G. M., Storm, A.J., Garcia-Arellano, C., Fadden, S.: DB2 design advisor: integrated automatic physical database design. In: Proceedings of the 30th International Conference on Very Large Data Bases, pp. 1087–1097 (2004)
79.
Zurück zum Zitat Zou, L., Mo, J., Zhao, D., Chen, L., Özsu, M.T.: gStore: answering SPARQL queries via subgraph matching. Proc. VLDB Endow. 4(1), 482–493 (2011)CrossRef Zou, L., Mo, J., Zhao, D., Chen, L., Özsu, M.T.: gStore: answering SPARQL queries via subgraph matching. Proc. VLDB Endow. 4(1), 482–493 (2011)CrossRef
Metadaten
Titel
Building self-clustering RDF databases using Tunable-LSH
verfasst von
Güneş Aluç
M. Tamer Özsu
Khuzaima Daudjee
Publikationsdatum
03.12.2018
Verlag
Springer Berlin Heidelberg
Erschienen in
The VLDB Journal / Ausgabe 2/2019
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-018-0530-9

Weitere Artikel der Ausgabe 2/2019

The VLDB Journal 2/2019 Zur Ausgabe