Skip to main content
Top

2019 | OriginalPaper | Chapter

Proximity Full-Text Search by Means of Additional Indexes with Multi-component Keys: In Pursuit of Optimal Performance

Author : Alexander B. Veretennikov

Published in: Data Analytics and Management in Data Intensive Domains

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Full-text search engines are important tools for information retrieval. In a proximity full-text search, a document is relevant if it contains query terms near each other, especially if the query terms are frequently occurring words. For each word in a text, we use additional indexes to store information about nearby words that are at distances from the given word of less than or equal to the MaxDistance parameter. We showed that additional indexes with three-component keys can be used to improve the average query execution time by up to 94.7 times if the queries consist of high-frequency occurring words. In this paper, we present a new search algorithm with even more performance gains. We consider several strategies for selecting multi-component key indexes for a specific query and compare these strategies with the optimal strategy. We also present the results of search experiments, which show that three-component key indexes enable much faster searches in comparison with two-component key indexes.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
5.
go back to reference Yan, H., Shi, S., Zhang, F., Suel, T., Wen, J.-R.: Efficient term proximity search with term-pair indexes. In: CIKM 2010 Proceedings of the 19th ACM International Conference on Information and Knowledge Management, Toronto, ON, Canada, 26–30 October 2010, pp. 1229–1238 (2010). https://doi.org/10.1145/1871437.1871593 Yan, H., Shi, S., Zhang, F., Suel, T., Wen, J.-R.: Efficient term proximity search with term-pair indexes. In: CIKM 2010 Proceedings of the 19th ACM International Conference on Information and Knowledge Management, Toronto, ON, Canada, 26–30 October 2010, pp. 1229–1238 (2010). https://​doi.​org/​10.​1145/​1871437.​1871593
8.
go back to reference Tomasic, A., Garcia-Molina, H., Shoens, K.: Incremental updates of inverted lists for text document retrieval. In: SIGMOD 1994 Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, Minneapolis, Minnesota, 24–27 May 1994, pp. 289–300 (1994). https://doi.org/10.1145/191839.191896 Tomasic, A., Garcia-Molina, H., Shoens, K.: Incremental updates of inverted lists for text document retrieval. In: SIGMOD 1994 Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, Minneapolis, Minnesota, 24–27 May 1994, pp. 289–300 (1994). https://​doi.​org/​10.​1145/​191839.​191896
11.
go back to reference Anh, V.N., de Kretser, O., Moffat, A.: Vector-space ranking with effective early termination. In: SIGIR 2001 Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, Louisiana, USA, 9–12 September 2001, pp. 35–42 (2001). https://doi.org/10.1145/383952.383957 Anh, V.N., de Kretser, O., Moffat, A.: Vector-space ranking with effective early termination. In: SIGIR 2001 Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, Louisiana, USA, 9–12 September 2001, pp. 35–42 (2001). https://​doi.​org/​10.​1145/​383952.​383957
12.
go back to reference Garcia, S., Williams, H.E., Cannane, A.: Access-ordered indexes. In: ACSC 2004 Proceedings of the 27th Australasian Conference on Computer Science, Dunedin, New Zealand, 18–22 January 2004, pp. 7–14 (2004) Garcia, S., Williams, H.E., Cannane, A.: Access-ordered indexes. In: ACSC 2004 Proceedings of the 27th Australasian Conference on Computer Science, Dunedin, New Zealand, 18–22 January 2004, pp. 7–14 (2004)
13.
go back to reference Bahle, D., Williams, H.E., Zobel, J.: Efficient phrase querying with an auxiliary index. In: SIGIR 2002 Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, 11–15 August 2002, pp. 215–221 (2002). https://doi.org/10.1145/564376.564415 Bahle, D., Williams, H.E., Zobel, J.: Efficient phrase querying with an auxiliary index. In: SIGIR 2002 Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, 11–15 August 2002, pp. 215–221 (2002). https://​doi.​org/​10.​1145/​564376.​564415
15.
go back to reference Veretennikov, A.B.: Proximity full-text search with a response time guarantee by means of additional indexes with multi-component keys. In: Selected Papers of the XX International Conference on Data Analytics and Management in Data Intensive Domains (DAMDID/RCDL 2018), Moscow, Russia, 9–12 October 2018, pp. 123–130 (2018). http://ceur-ws.org/Vol-2277 Veretennikov, A.B.: Proximity full-text search with a response time guarantee by means of additional indexes with multi-component keys. In: Selected Papers of the XX International Conference on Data Analytics and Management in Data Intensive Domains (DAMDID/RCDL 2018), Moscow, Russia, 9–12 October 2018, pp. 123–130 (2018). http://​ceur-ws.​org/​Vol-2277
16.
go back to reference Veretennikov, A.B.: O poiske fraz i naborov slov v polnotekstovom indekse (About phrases search in full-text index). Control Syst. Inf. Technol. 48(2.1), 125–130 (2012). (in Russian) Veretennikov, A.B.: O poiske fraz i naborov slov v polnotekstovom indekse (About phrases search in full-text index). Control Syst. Inf. Technol. 48(2.1), 125–130 (2012). (in Russian)
17.
go back to reference Veretennikov, A.B.: Effektivnyi polnotekstovyi poisk s uchetom blizosti slov pri pomoshchi trekhkomponentnykh klyuchei (Efficient full-text proximity search by means of three component keys). Control Syst. Inf. Technol. 69(3), 25–32 (2017). (in Russian) Veretennikov, A.B.: Effektivnyi polnotekstovyi poisk s uchetom blizosti slov pri pomoshchi trekhkomponentnykh klyuchei (Efficient full-text proximity search by means of three component keys). Control Syst. Inf. Technol. 69(3), 25–32 (2017). (in Russian)
18.
go back to reference Veretennikov, A.B.: Ispol’zovanie dopolnitel’nykh indeksov dlya bolee bystrogo polnotekstovogo poiska fraz, vklyuchayushchikh chasto vstrechayushchiesya slova (Using additional indexes for fast full-text searching phrases that contains frequently used words). Control Syst. Inf. Technol. 52(2), 61–66 (2013). (in Russian)MathSciNet Veretennikov, A.B.: Ispol’zovanie dopolnitel’nykh indeksov dlya bolee bystrogo polnotekstovogo poiska fraz, vklyuchayushchikh chasto vstrechayushchiesya slova (Using additional indexes for fast full-text searching phrases that contains frequently used words). Control Syst. Inf. Technol. 52(2), 61–66 (2013). (in Russian)MathSciNet
19.
go back to reference Veretennikov, A.B.: Effektivnyi polnotekstovyi poisk s ispol’zovaniem dopolnitel’nykh indeksov chasto vstrechayushchikhsya slov (Efficient full-text search by means of additional indexes of frequently used words). Control Syst. Inf. Technol. 66(4), 52–60 (2016). (in Russian) Veretennikov, A.B.: Effektivnyi polnotekstovyi poisk s ispol’zovaniem dopolnitel’nykh indeksov chasto vstrechayushchikhsya slov (Efficient full-text search by means of additional indexes of frequently used words). Control Syst. Inf. Technol. 66(4), 52–60 (2016). (in Russian)
20.
go back to reference Veretennikov, A.B.: Sozdanie dopolnitel’nykh indeksov dlya bolee bystrogo polnotekstovogo poiska fraz, vklyuchayushchikh chasto vstrechayushchiesya slova (Creating additional indexes for fast full-text searching phrases that contains frequently used words). Control Syst. Inf. Technol. 63(1), 27–33 (2016). (in Russian) Veretennikov, A.B.: Sozdanie dopolnitel’nykh indeksov dlya bolee bystrogo polnotekstovogo poiska fraz, vklyuchayushchikh chasto vstrechayushchiesya slova (Creating additional indexes for fast full-text searching phrases that contains frequently used words). Control Syst. Inf. Technol. 63(1), 27–33 (2016). (in Russian)
22.
go back to reference Williams, J.W.J.: Algorithm 232 – Heapsort. Commun. ACM 7(6), 347–348 (1964) Williams, J.W.J.: Algorithm 232 – Heapsort. Commun. ACM 7(6), 347–348 (1964)
Metadata
Title
Proximity Full-Text Search by Means of Additional Indexes with Multi-component Keys: In Pursuit of Optimal Performance
Author
Alexander B. Veretennikov
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-23584-0_7

Premium Partner