Skip to main content

2023 | OriginalPaper | Buchkapitel

Beyond Precision: A Study on Recall of Initial Retrieval with Neural Representations

verfasst von : Yan Xiao, Yixing Fan, Ruqing Zhang, Jiafeng Guo

Erschienen in: Information Retrieval

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Vocabulary mismatch is a central problem in information retrieval (IR), i.e., the relevant documents may not contain the same (symbolic) terms of the query. Recently, neural representations have shown great success in capturing semantic relatedness, leading to new possibilities to alleviate the vocabulary mismatch problem in IR. However, most existing efforts in this direction have been devoted to the re-ranking stage. That is to leverage neural representations to help re-rank a set of candidate documents, which are typically obtained from an initial retrieval stage based on some symbolic index and search scheme (e.g., BM25 over the inverted index). This naturally raises a question: if the relevant documents have not been found in the initial retrieval stage due to vocabulary mismatch, there would be no chance to re-rank them to the top positions later. Therefore, in this paper, we study the problem how to employ neural representations to improve the recall of relevant documents in the initial retrieval stage. Specifically, to meet the efficiency requirement of the initial stage, we introduce a neural index for the neural representations of documents, and propose two hybrid search schemes based on both neural and symbolic indices, namely the parallel search scheme and the sequential search scheme. Our experiments show that both hybrid index and search schemes can improve the recall of the initial retrieval stage with small overhead.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)CrossRefMATH Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)CrossRefMATH
2.
Zurück zum Zitat Boytsov, L., Novak, D., Malkov, Y., Nyberg, E.: Off the beaten path: let’s replace term-based retrieval with k-NN search. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 1099–1108. ACM (2016) Boytsov, L., Novak, D., Malkov, Y., Nyberg, E.: Off the beaten path: let’s replace term-based retrieval with k-NN search. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 1099–1108. ACM (2016)
3.
Zurück zum Zitat Brokos, G.-I., Malakasiotis, P., Androutsopoulos, I.: Using centroids of word embeddings and word mover’s distance for biomedical document retrieval in question answering. arXiv preprint arXiv:1608.03905 (2016) Brokos, G.-I., Malakasiotis, P., Androutsopoulos, I.: Using centroids of word embeddings and word mover’s distance for biomedical document retrieval in question answering. arXiv preprint arXiv:​1608.​03905 (2016)
4.
Zurück zum Zitat Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993) Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)
5.
Zurück zum Zitat Callan, J.P., Croft, W.B., Broglio, J.: TREC and TIPSTER experiments with inquery. Inf. Process. Manag. 31(3), 327–343 (1995)CrossRef Callan, J.P., Croft, W.B., Broglio, J.: TREC and TIPSTER experiments with inquery. Inf. Process. Manag. 31(3), 327–343 (1995)CrossRef
6.
Zurück zum Zitat Chen, J., Fang, H.-R., Saad, Y.: Fast approximate kNN graph construction for high dimensional data via recursive Lanczos bisection. J. Mach. Learn. Res. 10(Sep), 1989–2012 (2009)MATH Chen, J., Fang, H.-R., Saad, Y.: Fast approximate kNN graph construction for high dimensional data via recursive Lanczos bisection. J. Mach. Learn. Res. 10(Sep), 1989–2012 (2009)MATH
8.
Zurück zum Zitat Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on P-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, pp. 253–262. ACM (2004) Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on P-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, pp. 253–262. ACM (2004)
9.
Zurück zum Zitat Dong, W., Moses, C., Li, K.: Efficient k-nearest neighbor graph construction for generic similarity measures. In: Proceedings of the 20th International Conference on World Wide Web, pages 577–586. ACM (2011) Dong, W., Moses, C., Li, K.: Efficient k-nearest neighbor graph construction for generic similarity measures. In: Proceedings of the 20th International Conference on World Wide Web, pages 577–586. ACM (2011)
10.
Zurück zum Zitat Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4(Nov), 933–969 (2003)MATH Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4(Nov), 933–969 (2003)MATH
11.
Zurück zum Zitat Gong, Y., Lazebnik, S., Gordo, A., Perronnin, F.: Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2916–2929 (2013)CrossRef Gong, Y., Lazebnik, S., Gordo, A., Perronnin, F.: Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2916–2929 (2013)CrossRef
12.
Zurück zum Zitat Guo, J., Cai, Y., Fan, Y., Sun, F., Zhang, R., Cheng, X.: Semantic models for the first-stage retrieval: a comprehensive review. ACM Trans. Inf. Syst. 40(4), 1–42 (2022)CrossRef Guo, J., Cai, Y., Fan, Y., Sun, F., Zhang, R., Cheng, X.: Semantic models for the first-stage retrieval: a comprehensive review. ACM Trans. Inf. Syst. 40(4), 1–42 (2022)CrossRef
13.
Zurück zum Zitat Guo, J., Fan, Y., Ai, Q., Croft, W.B.: A deep relevance matching model for ad-hoc retrieval. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 55–64. ACM (2016) Guo, J., Fan, Y., Ai, Q., Croft, W.B.: A deep relevance matching model for ad-hoc retrieval. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 55–64. ACM (2016)
14.
Zurück zum Zitat Guo, J., Fan, Y., Ai, Q., Croft, W.B.: Semantic matching by non-linear word transportation for information retrieval. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 701–710. ACM (2016) Guo, J., Fan, Y., Ai, Q., Croft, W.B.: Semantic matching by non-linear word transportation for information retrieval. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 701–710. ACM (2016)
15.
Zurück zum Zitat Guo, J., et al.: A deep look into neural ranking models for information retrieval. Inf. Process. Manag. 57(6), 102067 (2020) Guo, J., et al.: A deep look into neural ranking models for information retrieval. Inf. Process. Manag. 57(6), 102067 (2020)
16.
Zurück zum Zitat Hajebi, K., Abbasi-Yadkori, Y., Shahbazi, H., Zhang, H.: Fast approximate nearest-neighbor search with k-nearest neighbor graph. In: Proceedings-International Joint Conference on Artificial Intelligence, vol. 22, p. 1312 (2011) Hajebi, K., Abbasi-Yadkori, Y., Shahbazi, H., Zhang, H.: Fast approximate nearest-neighbor search with k-nearest neighbor graph. In: Proceedings-International Joint Conference on Artificial Intelligence, vol. 22, p. 1312 (2011)
17.
Zurück zum Zitat Huang, P.-S., He, X., Gao, J., Deng, L., Acero, A., Heck, L.: Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management, pp. 2333–2338. ACM (2013) Huang, P.-S., He, X., Gao, J., Deng, L., Acero, A., Heck, L.: Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management, pp. 2333–2338. ACM (2013)
18.
Zurück zum Zitat Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 133–142. ACM (2002) Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 133–142. ACM (2002)
19.
Zurück zum Zitat Krovetz, R.: Viewing morphology as an inference process. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 191–202. ACM (1993) Krovetz, R.: Viewing morphology as an inference process. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 191–202. ACM (1993)
20.
Zurück zum Zitat Li, H., Liu, W., Ji, H.: Two-stage hashing for fast document retrieval. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), vol. 2, pp. 495–500 (2014) Li, H., Liu, W., Ji, H.: Two-stage hashing for fast document retrieval. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), vol. 2, pp. 495–500 (2014)
21.
Zurück zum Zitat Li, W., Zhang, Y., Sun, Y., Wang, W., Zhang, W., Lin, X.: Approximate nearest neighbor search on high dimensional data–experiments, analyses, and improvement (v1. 0). arXiv preprint arXiv:1610.02455 (2016) Li, W., Zhang, Y., Sun, Y., Wang, W., Zhang, W., Lin, X.: Approximate nearest neighbor search on high dimensional data–experiments, analyses, and improvement (v1. 0). arXiv preprint arXiv:​1610.​02455 (2016)
22.
Zurück zum Zitat Malkov, Y.A., Yashunin, D.: Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. arXiv preprint arXiv:1603.09320 (2016) Malkov, Y.A., Yashunin, D.: Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. arXiv preprint arXiv:​1603.​09320 (2016)
23.
Zurück zum Zitat Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
24.
Zurück zum Zitat Mitra, B., Nalisnick, E., Craswell, N., Caruana, R.: A dual embedding space model for document ranking. arXiv preprint arXiv:1602.01137 (2016) Mitra, B., Nalisnick, E., Craswell, N., Caruana, R.: A dual embedding space model for document ranking. arXiv preprint arXiv:​1602.​01137 (2016)
25.
27.
Zurück zum Zitat Tellez, E.S., Chavez, E., Navarro, G.: Succinct nearest neighbor search. Inf. Syst. 38(7), 1019–1030 (2013)CrossRef Tellez, E.S., Chavez, E., Navarro, G.: Succinct nearest neighbor search. Inf. Syst. 38(7), 1019–1030 (2013)CrossRef
28.
Zurück zum Zitat Xu, J., Wu, W., Li, H., Xu, G.: A kernel approach to addressing term mismatch. In: Proceedings of the 20th International Conference Companion on World Wide Web, pp. 153–154. ACM (2011) Xu, J., Wu, W., Li, H., Xu, G.: A kernel approach to addressing term mismatch. In: Proceedings of the 20th International Conference Companion on World Wide Web, pp. 153–154. ACM (2011)
Metadaten
Titel
Beyond Precision: A Study on Recall of Initial Retrieval with Neural Representations
verfasst von
Yan Xiao
Yixing Fan
Ruqing Zhang
Jiafeng Guo
Copyright-Jahr
2023
DOI
https://doi.org/10.1007/978-3-031-24755-2_7