Skip to main content
Top

2016 | OriginalPaper | Chapter

On Optimizing Partitioning Strategies for Faster Inverted Index Compression

Authors : Xingshen Song, Kun Jiang, Yu Jiang, Yuexiang Yang

Published in: Computational Science and Its Applications -- ICCSA 2016

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Inverted index is a key component for search engine to manage billions of documents and fast respond to users’ queries. While substantial effort has been made to compromise space occupancy and decoding speed, what has been overlooked is the encoding speed when constructing the index. VSEncoding is a powerful encoder that works by optimally partitioning a list of integers into blocks which are efficiently compressed by using simple encoders, however, these partitions are found by using a dynamic programming approach which is obviously inefficient. In this paper, we introduce compression speed as one criterion to evaluate compression techniques, and thoroughly analyze performances of different partitioning strategies. A linear-time optimization is also proposed, to enhance VSEncoding with faster compression speed and more flexibility to partition an index. Experiments show that our method offers a far more better compression speed, while retaining an excellent space occupancy and decompression speed.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Anh, V.N., Moffat, A.: Index compression using fixed binary codewords. In: Proceedings of the 15th Australasian Database Conference, vol. 27, pp. 61–67. Australian Computer Society, Inc. (2004) Anh, V.N., Moffat, A.: Index compression using fixed binary codewords. In: Proceedings of the 15th Australasian Database Conference, vol. 27, pp. 61–67. Australian Computer Society, Inc. (2004)
2.
go back to reference Anh, V.N., Moffat, A.: Inverted index compression using word-aligned binary codes. Inf. Retr. 8(1), 151–166 (2005)CrossRef Anh, V.N., Moffat, A.: Inverted index compression using word-aligned binary codes. Inf. Retr. 8(1), 151–166 (2005)CrossRef
3.
go back to reference Anh, V.N., Moffat, A.: Index compression using 64-bit words. Softw. Pract. Exp. 40(2), 131–147 (2010) Anh, V.N., Moffat, A.: Index compression using 64-bit words. Softw. Pract. Exp. 40(2), 131–147 (2010)
4.
go back to reference Catena, M., Macdonald, C., Ounis, I.: On inverted index compression for search engine efficiency. In: de Rijke, M., Kenter, T., de Vries, A.P., Zhai, C.X., de Jong, F., Radinsky, K., Hofmann, K. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 359–371. Springer, Heidelberg (2014)CrossRef Catena, M., Macdonald, C., Ounis, I.: On inverted index compression for search engine efficiency. In: de Rijke, M., Kenter, T., de Vries, A.P., Zhai, C.X., de Jong, F., Radinsky, K., Hofmann, K. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 359–371. Springer, Heidelberg (2014)CrossRef
5.
go back to reference Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. MIT Press, Cambridge (2009)MATH Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. MIT Press, Cambridge (2009)MATH
6.
go back to reference Delbru, R., Campinas, S., Samp, K., Tummarello, G.: Adaptive frame of reference for compressing inverted lists. Technical report, DERI-Digital Enterprise Research Institute, December 2010 Delbru, R., Campinas, S., Samp, K., Tummarello, G.: Adaptive frame of reference for compressing inverted lists. Technical report, DERI-Digital Enterprise Research Institute, December 2010
7.
go back to reference Ferragina, P., Nitto, I., Venturini, R.: On optimally partitioning a text to improve its compression. Algorithmica 61(1), 51–74 (2011)MathSciNetCrossRefMATH Ferragina, P., Nitto, I., Venturini, R.: On optimally partitioning a text to improve its compression. Algorithmica 61(1), 51–74 (2011)MathSciNetCrossRefMATH
8.
go back to reference Goldstein, J., Ramakrishnan, R., Shaft, U.: Compressing relations and indexes. In: Proceedings of 14th International Conference on Data Engineering, pp. 370–379. IEEE (1998) Goldstein, J., Ramakrishnan, R., Shaft, U.: Compressing relations and indexes. In: Proceedings of 14th International Conference on Data Engineering, pp. 370–379. IEEE (1998)
9.
go back to reference Lemire, D., Boytsov, L.: Decoding billions of integers per second through vectorization. Softw. Pract. Exp. 45(1), 1–29 (2015)CrossRef Lemire, D., Boytsov, L.: Decoding billions of integers per second through vectorization. Softw. Pract. Exp. 45(1), 1–29 (2015)CrossRef
10.
go back to reference Manning, C.D., Raghavan, P., Schütze, H., et al.: Introduction to Information Retrieval, vol. 1. Cambridge university press, Cambridge (2008)CrossRefMATH Manning, C.D., Raghavan, P., Schütze, H., et al.: Introduction to Information Retrieval, vol. 1. Cambridge university press, Cambridge (2008)CrossRefMATH
11.
go back to reference Ottaviano, G., Tonellotto, N., Venturini, R.: Optimal space-time tradeoffs for inverted indexes. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 47–56. ACM (2015) Ottaviano, G., Tonellotto, N., Venturini, R.: Optimal space-time tradeoffs for inverted indexes. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 47–56. ACM (2015)
12.
go back to reference Ottaviano, G., Venturini, R.: Partitioned elias-fano indexes. In: Proceedingsof the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 273–282. ACM (2014) Ottaviano, G., Venturini, R.: Partitioned elias-fano indexes. In: Proceedingsof the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 273–282. ACM (2014)
13.
go back to reference Silvestri, F., Venturini, R.: Vsencoding: efficient coding and fast decoding of integer lists via dynamic programming. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 1219–1228. ACM (2010) Silvestri, F., Venturini, R.: Vsencoding: efficient coding and fast decoding of integer lists via dynamic programming. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 1219–1228. ACM (2010)
14.
go back to reference Stepanov, A.A., Gangolli, A.R., Rose, D.E., Ernst, R.J., Oberoi, P.S.:SIMD-based decoding of posting lists. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 317–326. ACM (2011) Stepanov, A.A., Gangolli, A.R., Rose, D.E., Ernst, R.J., Oberoi, P.S.:SIMD-based decoding of posting lists. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 317–326. ACM (2011)
15.
go back to reference Trotman, A.: Compression, SIMD, and postings lists. In: Proceedings of the 2014 Australasian Document Computing Symposium, p. 50. ACM (2014) Trotman, A.: Compression, SIMD, and postings lists. In: Proceedings of the 2014 Australasian Document Computing Symposium, p. 50. ACM (2014)
16.
go back to reference Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, San Francisco (1999)MATH Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, San Francisco (1999)MATH
17.
go back to reference Yan, H., Ding, S., Suel, T.: Inverted index compression and query processing with optimized document ordering. In: Proceedings of the 18th International Conference on World Wide Web, pp. 401–410. ACM (2009) Yan, H., Ding, S., Suel, T.: Inverted index compression and query processing with optimized document ordering. In: Proceedings of the 18th International Conference on World Wide Web, pp. 401–410. ACM (2009)
18.
go back to reference Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. (CSUR) 38(2), 6 (2006)CrossRef Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. (CSUR) 38(2), 6 (2006)CrossRef
Metadata
Title
On Optimizing Partitioning Strategies for Faster Inverted Index Compression
Authors
Xingshen Song
Kun Jiang
Yu Jiang
Yuexiang Yang
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-42089-9_18

Premium Partner