Skip to main content

2015 | OriginalPaper | Buchkapitel

Building Space-Efficient Inverted Indexes on Low-Cardinality Dimensions

verfasst von : Vasilis Spyropoulos, Yannis Kotidis

Erschienen in: Database and Expert Systems Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Many modern applications naturally lead to the implementation of inverted indexes for effectively managing large collections of data items. Creating an inverted index on a low cardinality data domain results in replication of data descriptors, leading to increased storage overhead. For example, the use of RFID or similar sensing devices in supply-chains results in massive tracking datasets that need effective spatial or spatio-temporal indexes on them. As the volume of data grows proportionally larger than the number of spatial locations or time epochs, it is unavoidable that many of the resulting lists share large subsets of common items. In this paper we present techniques that exploit this characteristic of modern big-data applications in order to losslessly compress the resulting inverted indexes by discovering large common item sets and adapting the index so as to store just one copy of them. We apply our method in the supply chain domain using modern big-data tools and show that our techniques in many cases achieve compression ratios that exceed 50 %.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bleco, D., Kotidis, Y.: RFID Data Aggregation. In: Trigoni, N., Markham, A., Nawaz, S. (eds.) GSN 2009. LNCS, vol. 5659, pp. 87–101. Springer, Heidelberg (2009) CrossRef Bleco, D., Kotidis, Y.: RFID Data Aggregation. In: Trigoni, N., Markham, A., Nawaz, S. (eds.) GSN 2009. LNCS, vol. 5659, pp. 87–101. Springer, Heidelberg (2009) CrossRef
2.
Zurück zum Zitat Bleco, D., Kotidis, Y.: Business intelligence on complex graph data. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops, EDBT-ICDT 2012, pp. 13–20. ACM, New York (2012) Bleco, D., Kotidis, Y.: Business intelligence on complex graph data. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops, EDBT-ICDT 2012, pp. 13–20. ACM, New York (2012)
3.
Zurück zum Zitat Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB 1994, pp. 487–499. Morgan Kaufmann Publishers Inc., San Francisco (1994) Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB 1994, pp. 487–499. Morgan Kaufmann Publishers Inc., San Francisco (1994)
4.
Zurück zum Zitat Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. J. Comput. Syst. Sci. 60(3), 630–659 (2000)MathSciNetCrossRef Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. J. Comput. Syst. Sci. 60(3), 630–659 (2000)MathSciNetCrossRef
5.
Zurück zum Zitat Knuth, D.E.: The Art of Computer Programming, vol. 1 (3rd Ed.): Fundamental Algorithms. Addison Wesley Longman Publishing Company Inc, Redwood City (1997) MATH Knuth, D.E.: The Art of Computer Programming, vol. 1 (3rd Ed.): Fundamental Algorithms. Addison Wesley Longman Publishing Company Inc, Redwood City (1997) MATH
6.
Zurück zum Zitat Rajaraman, A., Ullman, J.D.: Mining of massive datasets. Cambridge University Press, Cambridge (2012) Rajaraman, A., Ullman, J.D.: Mining of massive datasets. Cambridge University Press, Cambridge (2012)
7.
Zurück zum Zitat Papadimitriou, C.H., Steiglitz, K.: Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall Inc, Upper Saddle River (1982) Papadimitriou, C.H., Steiglitz, K.: Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall Inc, Upper Saddle River (1982)
8.
Zurück zum Zitat Willems, S.P.: Data set–real-world multiechelon supply chains used for inventory optimization. Manufact. Serv. Oper. Manage. 10(1), 19–23 (2008)CrossRef Willems, S.P.: Data set–real-world multiechelon supply chains used for inventory optimization. Manufact. Serv. Oper. Manage. 10(1), 19–23 (2008)CrossRef
9.
Zurück zum Zitat Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD 2000, pp. 1–12. ACM, New York (2000) Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD 2000, pp. 1–12. ACM, New York (2000)
10.
11.
Zurück zum Zitat Viger, P.F., Gomariz, A., Gueniche, T., Soltani, A., Wu, C.W., Tseng, V.S.: SPMF: a java open-source pattern mining library. J. Mach. Learn. Res. 15, 3389–3393 (2014) Viger, P.F., Gomariz, A., Gueniche, T., Soltani, A., Wu, C.W., Tseng, V.S.: SPMF: a java open-source pattern mining library. J. Mach. Learn. Res. 15, 3389–3393 (2014)
Metadaten
Titel
Building Space-Efficient Inverted Indexes on Low-Cardinality Dimensions
verfasst von
Vasilis Spyropoulos
Yannis Kotidis
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-22849-5_30