Skip to main content
Erschienen in: Artificial Intelligence Review 8/2021

20.02.2021

Pseudo-relevance feedback based query expansion using boosting algorithm

verfasst von: Imran Rasheed, Haider Banka, Hamaid Mahmood Khan

Erschienen in: Artificial Intelligence Review | Ausgabe 8/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Retrieving relevant documents from a large set using the original query is a formidable challenge. A generic approach to improve the retrieval process is realized using pseudo-relevance feedback techniques. This technique allows the expansion of original queries with conducive keywords that returns the most relevant documents corresponding to the original query. In this paper, five different hybrid techniques were tested utilizing traditional query expansion methods. Later, the boosting query term method was proposed to reweigh and strengthen the original query. The query-wise analysis revealed that the proposed approach effectively identified the most relevant keywords, and that was true even for short queries. All the proposed methods’ potency was evaluated on three different datasets; Roshni, Hamshahri1, and FIRE2011. Compared to the traditional query expansion methods, the proposed methods improved the mean average precision values of Urdu, Persian, and English datasets by 14.02%, 9.93%, and 6.60%, respectively. The obtained results were also established using analysis of variance and post-hoc analysis.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat AleAhmad A, Amiri H, Darrudi E, Rahgozar M, Oroumchian F (2009) Hamshahri: a standard persian text collection. Knowl-Based Syst 22(5):382–387CrossRef AleAhmad A, Amiri H, Darrudi E, Rahgozar M, Oroumchian F (2009) Hamshahri: a standard persian text collection. Knowl-Based Syst 22(5):382–387CrossRef
Zurück zum Zitat Amati G, Van Rijsbergen CJ (2002) Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans Inf Syst 20(4):357–389CrossRef Amati G, Van Rijsbergen CJ (2002) Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans Inf Syst 20(4):357–389CrossRef
Zurück zum Zitat Bajaj P, Campos D, Craswell N, Deng L, Gao J, Liu X, Majumder R, McNamara A, Mitra B, Nguyen T et al (2016) Ms marco: a human generated machine reading comprehension dataset. arXiv preprintarXiv:1611.09268 Bajaj P, Campos D, Craswell N, Deng L, Gao J, Liu X, Majumder R, McNamara A, Mitra B, Nguyen T et al (2016) Ms marco: a human generated machine reading comprehension dataset. arXiv preprintarXiv:​1611.​09268
Zurück zum Zitat Bendersky M, Croft WB (2008). Discovering key concepts in verbose queries. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, pp 491–498 Bendersky M, Croft WB (2008). Discovering key concepts in verbose queries. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, pp 491–498
Zurück zum Zitat Carpineto C, Romano G (2012) A survey of automatic query expansion in information retrieval. Acm Comput Surv 44(1):1–50CrossRef Carpineto C, Romano G (2012) A survey of automatic query expansion in information retrieval. Acm Comput Surv 44(1):1–50CrossRef
Zurück zum Zitat Church K, Hanks P (1990) Word association norms, mutual information, and lexicography. Comput Linguist 16(1):22–29 Church K, Hanks P (1990) Word association norms, mutual information, and lexicography. Comput Linguist 16(1):22–29
Zurück zum Zitat Craswell N, Robertson S, Zaragoza H, Taylor M (2005). Relevance weighting for query independent evidence. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, pp 416–423 Craswell N, Robertson S, Zaragoza H, Taylor M (2005). Relevance weighting for query independent evidence. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, pp 416–423
Zurück zum Zitat Croft WB, Cronen-Townsend S, Lavrenko V (2001) Relevance feedback and personalization: a language modeling perspective. In DELOS, Citeseer Croft WB, Cronen-Townsend S, Lavrenko V (2001) Relevance feedback and personalization: a language modeling perspective. In DELOS, Citeseer
Zurück zum Zitat Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​1810.​04805
Zurück zum Zitat Diaz F (2016) Pseudo-query reformulation. In: European conference on information retrieval. Springer, pp 521–532 Diaz F (2016) Pseudo-query reformulation. In: European conference on information retrieval. Springer, pp 521–532
Zurück zum Zitat Felsenthal DS, Nurmi H (2019) 20 voting procedures designed to elect a single candidate. In: Voting procedures under a restricted domain. Springer, pp 5–16 Felsenthal DS, Nurmi H (2019) 20 voting procedures designed to elect a single candidate. In: Voting procedures under a restricted domain. Springer, pp 5–16
Zurück zum Zitat Fraenkel J, Grofman B (2014) The Borda count and its real-world alternatives: comparing scoring rules in Nauru and Slovenia. Aust J Polit Sci 49(2):186–205CrossRef Fraenkel J, Grofman B (2014) The Borda count and its real-world alternatives: comparing scoring rules in Nauru and Slovenia. Aust J Polit Sci 49(2):186–205CrossRef
Zurück zum Zitat Gabrilovich E, Broder A, Fontoura M, Joshi A, Josifovski V, Riedel L, Zhang T (2009) Classifying search queries using the web as a source of knowledge. ACM Trans Web 3(2):1–28CrossRef Gabrilovich E, Broder A, Fontoura M, Joshi A, Josifovski V, Riedel L, Zhang T (2009) Classifying search queries using the web as a source of knowledge. ACM Trans Web 3(2):1–28CrossRef
Zurück zum Zitat Gupta Y, Saini A (2017) A novel fuzzy-PSO term weighting automatic query expansion approach using combined semantic filtering. Knowl-Based Syst 136:97–120CrossRef Gupta Y, Saini A (2017) A novel fuzzy-PSO term weighting automatic query expansion approach using combined semantic filtering. Knowl-Based Syst 136:97–120CrossRef
Zurück zum Zitat Han FX, Niu D, Chen H, Lai K, He Y, Xu Y (2019) A deep generative approach to search extrapolation and recommendation. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1771–1779 Han FX, Niu D, Chen H, Lai K, He Y, Xu Y (2019) A deep generative approach to search extrapolation and recommendation. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1771–1779
Zurück zum Zitat Kang I-H, Kim G (2003) Query type classification for web document retrieval. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval, pp 64–71 Kang I-H, Kim G (2003) Query type classification for web document retrieval. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval, pp 64–71
Zurück zum Zitat Karisani P, Rahgozar M, Oroumchian F (2016) A query term re-weighting approach using document similarity. Inf Process Manag 52(3):478–489CrossRef Karisani P, Rahgozar M, Oroumchian F (2016) A query term re-weighting approach using document similarity. Inf Process Manag 52(3):478–489CrossRef
Zurück zum Zitat Khennak I, Drias H (2017) An accelerated PSO for query expansion in web information retrieval: application to medical dataset. Appl Intell 47(3):793–808CrossRef Khennak I, Drias H (2017) An accelerated PSO for query expansion in web information retrieval: application to medical dataset. Appl Intell 47(3):793–808CrossRef
Zurück zum Zitat Khennak I, Drias H (2018) Data mining techniques and nature-inspired algorithms for query expansion. In: Proceedings of the international conference on learning and optimization algorithms: theory and applications, pp 1–6 Khennak I, Drias H (2018) Data mining techniques and nature-inspired algorithms for query expansion. In: Proceedings of the international conference on learning and optimization algorithms: theory and applications, pp 1–6
Zurück zum Zitat Khennak I, Drias H, Kechid S (2016) A new modeling of query expansion using an effective bat-inspired optimization algorithm. IFAC-PapersOnLine 49(12):1791–1796CrossRef Khennak I, Drias H, Kechid S (2016) A new modeling of query expansion using an effective bat-inspired optimization algorithm. IFAC-PapersOnLine 49(12):1791–1796CrossRef
Zurück zum Zitat Kuzi S, Zhang M, Li C, Bendersky M, Najork M (2020) Leveraging semantic and lexical matching to improve the recall of document retrieval systems: a hybrid approach. arXiv preprintarXiv:2010.01195 Kuzi S, Zhang M, Li C, Bendersky M, Najork M (2020) Leveraging semantic and lexical matching to improve the recall of document retrieval systems: a hybrid approach. arXiv preprintarXiv:​2010.​01195
Zurück zum Zitat Lee KS, Croft WB, Allan J (2008) A cluster-based resampling method for pseudo-relevance feedback. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, pp 235–242 Lee KS, Croft WB, Allan J (2008) A cluster-based resampling method for pseudo-relevance feedback. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, pp 235–242
Zurück zum Zitat Li R, Li L, Wu X, Zhou Y, Wang W (2019) Click feedback-aware query recommendation using adversarial examples. In: The World Wide Web conference, pp 2978–2984 Li R, Li L, Wu X, Zhou Y, Wang W (2019) Click feedback-aware query recommendation using adversarial examples. In: The World Wide Web conference, pp 2978–2984
Zurück zum Zitat Majumder P, Mitra M, Pal D, Bandyopadhyay A, Maiti S, Pal S, Modak D, Sanyal S (2010) The fire 2008 evaluation exercise. ACM Trans Asian Lang Inf Process 9(3):1–24CrossRef Majumder P, Mitra M, Pal D, Bandyopadhyay A, Maiti S, Pal S, Modak D, Sanyal S (2010) The fire 2008 evaluation exercise. ACM Trans Asian Lang Inf Process 9(3):1–24CrossRef
Zurück zum Zitat Naseer A, Hussain S, et al (2009) Assas-band, an affix-exception-list based Urdu stemmer. In: Proceedings of the 7th workshop on Asian language resources (ALR7), pp 40–47 Naseer A, Hussain S, et al (2009) Assas-band, an affix-exception-list based Urdu stemmer. In: Proceedings of the 7th workshop on Asian language resources (ALR7), pp 40–47
Zurück zum Zitat Pal D, Mitra M, Datta K (2014) Improving query expansion using wordnet. J Assoc Inf Sci Technol 65(12):2469–2478CrossRef Pal D, Mitra M, Datta K (2014) Improving query expansion using wordnet. J Assoc Inf Sci Technol 65(12):2469–2478CrossRef
Zurück zum Zitat Pedronette DCG, Almeida J, Torres RDS (2014) A scalable re-ranking method for content-based image retrieval. Inf Sci 265:91–104MathSciNetCrossRef Pedronette DCG, Almeida J, Torres RDS (2014) A scalable re-ranking method for content-based image retrieval. Inf Sci 265:91–104MathSciNetCrossRef
Zurück zum Zitat Ramos J et al (2003) Using tf-idf to determine word relevance in document queries. In: Proceedings of the first instructional conference on machine learning. New Jersey, USA, vol 242, pp 133–142 Ramos J et al (2003) Using tf-idf to determine word relevance in document queries. In: Proceedings of the first instructional conference on machine learning. New Jersey, USA, vol 242, pp 133–142
Zurück zum Zitat Rasheed I Banka H (2018) Query expansion in information retrieval for Urdu language. In: 2018 fourth international conference on information retrieval and knowledge management (CAMP). IEEE, pp 1–6 Rasheed I Banka H (2018) Query expansion in information retrieval for Urdu language. In: 2018 fourth international conference on information retrieval and knowledge management (CAMP). IEEE, pp 1–6
Zurück zum Zitat Rasheed I, Gupta V, Banka H, Kumar C (2018) Urdu text classification: a comparative study using machine learning techniques. In: 2018 thirteenth international conference on digital information management (ICDIM). IEEE, pp 274–278 Rasheed I, Gupta V, Banka H, Kumar C (2018) Urdu text classification: a comparative study using machine learning techniques. In: 2018 thirteenth international conference on digital information management (ICDIM). IEEE, pp 274–278
Zurück zum Zitat Raza MA, Mokhtar R, Ahmad N (2019) A survey of statistical approaches for query expansion. Knowl Inf Syst 61:1–25CrossRef Raza MA, Mokhtar R, Ahmad N (2019) A survey of statistical approaches for query expansion. Knowl Inf Syst 61:1–25CrossRef
Zurück zum Zitat Riaz K (2008) Concept search in Urdu. In: Proceedings of the 2nd PhD workshop on information and knowledge management, pp 33–40 Riaz K (2008) Concept search in Urdu. In: Proceedings of the 2nd PhD workshop on information and knowledge management, pp 33–40
Zurück zum Zitat Robertson SE (1977) The probability ranking principle in IR. J Doc Robertson SE (1977) The probability ranking principle in IR. J Doc
Zurück zum Zitat Robertson SE, Jones KS (1976) Relevance weighting of search terms. J Am Soc Inf Sci 27(3):129–146CrossRef Robertson SE, Jones KS (1976) Relevance weighting of search terms. J Am Soc Inf Sci 27(3):129–146CrossRef
Zurück zum Zitat Rocchio J (1971) Relevance feedback in information retrieval. The smart retrieval system-experiments in automatic document processing, pp 313–323 Rocchio J (1971) Relevance feedback in information retrieval. The smart retrieval system-experiments in automatic document processing, pp 313–323
Zurück zum Zitat Salton G, Buckley C (1990) Improving retrieval performance by relevance feedback. J Am Soc Inf Sci 41(4):288–297CrossRef Salton G, Buckley C (1990) Improving retrieval performance by relevance feedback. J Am Soc Inf Sci 41(4):288–297CrossRef
Zurück zum Zitat Sieg A, Mobasher B, Burke R (2007) Web search personalization with ontological user profiles. In: Proceedings of the sixteenth ACM conference on conference on information and knowledge management, pp 525–534 Sieg A, Mobasher B, Burke R (2007) Web search personalization with ontological user profiles. In: Proceedings of the sixteenth ACM conference on conference on information and knowledge management, pp 525–534
Zurück zum Zitat Taghi-Zadeh H, Sadreddini MH, Diyanati MH, Rasekh AH (2017) A new hybrid stemming method for Persian language. Digital Scholarsh Hum 32(1):209–221 Taghi-Zadeh H, Sadreddini MH, Diyanati MH, Rasekh AH (2017) A new hybrid stemming method for Persian language. Digital Scholarsh Hum 32(1):209–221
Zurück zum Zitat Thaker R, Goel A (2015) Domain specific ontology based query processing system for Urdu language. Int J Comput Appl 121(13):20–23 Thaker R, Goel A (2015) Domain specific ontology based query processing system for Urdu language. Int J Comput Appl 121(13):20–23
Zurück zum Zitat Van Rijsbergen CJ (1977) A theoretical basis for the use of co-occurrence data in information retrieval. J Doc 32:106–199CrossRef Van Rijsbergen CJ (1977) A theoretical basis for the use of co-occurrence data in information retrieval. J Doc 32:106–199CrossRef
Zurück zum Zitat Voorhees EM (1994) Query expansion using lexical-semantic relations. In: SIGIR’94. Springer, pp 61–69 Voorhees EM (1994) Query expansion using lexical-semantic relations. In: SIGIR’94. Springer, pp 61–69
Zurück zum Zitat Walker S, Robertson S, Boughanem M (1996) Okapi at trec-6: automatic ad hoc, vlc, routing and filtering. In: Proceedings of the fifth text retrieval conference. Gaithersburg, pp 500–240 Walker S, Robertson S, Boughanem M (1996) Okapi at trec-6: automatic ad hoc, vlc, routing and filtering. In: Proceedings of the fifth text retrieval conference. Gaithersburg, pp 500–240
Zurück zum Zitat Wei Z, Gao W, El-Ganainy T, Magdy W, Wong K-F (2014) Ranking model selection and fusion for effective microblog search. In: Proceedings of the first international workshop on social media retrieval and analysis, pp 21–26 Wei Z, Gao W, El-Ganainy T, Magdy W, Wong K-F (2014) Ranking model selection and fusion for effective microblog search. In: Proceedings of the first international workshop on social media retrieval and analysis, pp 21–26
Zurück zum Zitat Xu J, Croft WB (2017) Quary expansion using local and global document analysis. Acm Sigir Forum 51:168–175CrossRef Xu J, Croft WB (2017) Quary expansion using local and global document analysis. Acm Sigir Forum 51:168–175CrossRef
Zurück zum Zitat Xu Y, Benaroch M (2005) Information retrieval with a hybrid automatic query expansion and data fusion procedure. Inf Retr 8(1):41–65CrossRef Xu Y, Benaroch M (2005) Information retrieval with a hybrid automatic query expansion and data fusion procedure. Inf Retr 8(1):41–65CrossRef
Zurück zum Zitat Xu Y, Jones GJ, Wang B (2009) Query dependent pseudo-relevance feedback based on Wikipedia. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval, pp 59–66 Xu Y, Jones GJ, Wang B (2009) Query dependent pseudo-relevance feedback based on Wikipedia. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval, pp 59–66
Zurück zum Zitat Yilmaz ZA, Yang W, Zhang H, Lin J (2019) Cross-domain modeling of sentence-level evidence for document retrieval. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 3481–3487 Yilmaz ZA, Yang W, Zhang H, Lin J (2019) Cross-domain modeling of sentence-level evidence for document retrieval. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 3481–3487
Zurück zum Zitat Zaragoza H, Craswell N, Taylor MJ, Saria S, Robertson SE (2004) Microsoft Cambridge at TREC 13: web and hard tracks. In: TREC, vol 4, p 1 Zaragoza H, Craswell N, Taylor MJ, Saria S, Robertson SE (2004) Microsoft Cambridge at TREC 13: web and hard tracks. In: TREC, vol 4, p 1
Zurück zum Zitat Zia T, Akhter MP, Abbas Q (2015) Comparative study of feature selection approaches for Urdu text categorization. Malays J Comput Sci 28(2):93–109 Zia T, Akhter MP, Abbas Q (2015) Comparative study of feature selection approaches for Urdu text categorization. Malays J Comput Sci 28(2):93–109
Metadaten
Titel
Pseudo-relevance feedback based query expansion using boosting algorithm
verfasst von
Imran Rasheed
Haider Banka
Hamaid Mahmood Khan
Publikationsdatum
20.02.2021
Verlag
Springer Netherlands
Erschienen in
Artificial Intelligence Review / Ausgabe 8/2021
Print ISSN: 0269-2821
Elektronische ISSN: 1573-7462
DOI
https://doi.org/10.1007/s10462-021-09972-4

Weitere Artikel der Ausgabe 8/2021

Artificial Intelligence Review 8/2021 Zur Ausgabe

Premium Partner