Skip to main content
Erschienen in: Discover Computing 3-4/2019

24.10.2018 | Knowledge Graphs and Semantics in Text Analysis and Retrieval

Overcoming low-utility facets for complex answer retrieval

verfasst von: Sean MacAvaney, Andrew Yates, Arman Cohan, Luca Soldaini, Kai Hui, Nazli Goharian, Ophir Frieder

Erschienen in: Discover Computing | Ausgabe 3-4/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Many questions cannot be answered simply; their answers must include numerous nuanced details and context. Complex Answer Retrieval (CAR) is the retrieval of answers to such questions. These questions can be constructed from a topic entity (e.g., ‘cheese’) and a facet (e.g., ‘health effects’). While topic matching has been thoroughly explored, we observe that some facets use general language that is unlikely to appear verbatim in answers, exhibiting low utility. In this work, we present an approach to CAR that identifies and addresses low-utility facets. First, we propose two estimators of facet utility: the hierarchical structure of CAR queries, and facet frequency information from training data. Then, to improve the retrieval performance on low-utility headings, we include entity similarity scores using embeddings trained from a CAR knowledge graph, which captures the context of facets. We show that our methods are effective by applying them to two leading neural ranking techniques, and evaluating them on the TREC CAR dataset. We find that our approach perform significantly better than the unmodified neural ranker and other leading CAR techniques, yielding state-of-the-art results. We also provide a detailed analysis of our results, verify that low-utility facets are indeed difficult to match, and that our approach improves the performance for these difficult queries.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Note that CAR queries are not necessarily complex. A question as simple as ‘Is cheese healthy?’ requires a complex answer: a detailed and nuanced description of positive and negative health effects of cheese consumption is required to satisfy the information need. In contrast, a question such as ‘How much Mozzarella cheese do I need to eat to satisfy my daily requirement of calcium?’ is a complex question with a simple factoid answer because it involves advanced reasoning that goes beyond what is typically captured by a knowledge graph.
 
2
E.g., templates, talk pages, portals, lists, references, and pages representing people, organizations, music, books, and others are discarded (Dietz et al. 2017).
 
3
We use the symbol »to separate heading components of a query.
 
4
For query Q and document D, the similarity matrix S of size \(|Q|\times |D|\) is be computed by calculating the similarity (e.g., cosine) between the representations (e.g., word embeddings) of each query term and document term, i.e., \(S[i,j]=sim(Q_i,D_j)\). A similarity matrix allows for query terms to be soft-matched to document terms.
 
5
For reference, the 60th percentile is approximately the cutoff for headings that only appear a few times such as Red Hot Chili Peppers; the 90th percentile is approximately the cutoff of moderately frequent headings such as Finland; and the 99th percentile is approximately the cutoff of frequent headings such as Family and personal life.
 
6
We also cannot remove the evaluation topics when training the graph because that defeats the purpose; without target entities encoded in the embeddings, there is no way to find similar entities when ranking.
 
8
We acknowledge that some paragraphs included as negative training samples, if inspected manually, would be found relevant due to the limitations of the automatic relevance judgments. We deem this as okay, considering the high occurrence of non-relevant documents in the manual relevance judgments, and the comparatively poor performance of BM25 at CAR.
 
9
On average, there are 42 manual relevance judgments for the 702 queries that were manually assessed.
 
10
Unjudged evaluation is unavailable for the sequential dependency model and Siamese attention network.
 
11
We observed similar behavior for MAP, R-Prec, MRR, and nDCG, so we only report MAP here.
 
Literatur
Zurück zum Zitat Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., & Ives, Z. (2007). DBpedia: A nucleus for a web of open data. In: The semantic web (pp. 722–735). Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., & Ives, Z. (2007). DBpedia: A nucleus for a web of open data. In: The semantic web (pp. 722–735).
Zurück zum Zitat Bollacker, K. D., Evans, C., Paritosh, P., Sturge, T., & Taylor, J. (2008). Freebase: A collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data (pp. 1247–1250). Bollacker, K. D., Evans, C., Paritosh, P., Sturge, T., & Taylor, J. (2008). Freebase: A collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data (pp. 1247–1250).
Zurück zum Zitat Bordes, A., Usunier, N., García-Durán, A., Weston, J., & Yakhnenko, O. (2013). Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems (pp. 2787–2795. Bordes, A., Usunier, N., García-Durán, A., Weston, J., & Yakhnenko, O. (2013). Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems (pp. 2787–2795.
Zurück zum Zitat Dai, Z., Xiong, C., Callan, J. P., & Liu, Z. (2018). Convolutional neural networks for soft-matching n-grams in ad-hoc search. In: Proceedings of the eleventh ACM international conference on web search and data mining (pp. 126–134). Dai, Z., Xiong, C., Callan, J. P., & Liu, Z. (2018). Convolutional neural networks for soft-matching n-grams in ad-hoc search. In: Proceedings of the eleventh ACM international conference on web search and data mining (pp. 126–134).
Zurück zum Zitat Daiber, J., Jakob, M., Hokamp, C., & Mendes, P. N. (2013). Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th international conference on semantic systems (pp. 121–124). Daiber, J., Jakob, M., Hokamp, C., & Mendes, P. N. (2013). Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th international conference on semantic systems (pp. 121–124).
Zurück zum Zitat Dalton, J., Dietz, L., & Allan, J. (2014). Entity query feature expansion using knowledge base links. In: Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval, ACM (pp. 365–374). Dalton, J., Dietz, L., & Allan, J. (2014). Entity query feature expansion using knowledge base links. In: Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval, ACM (pp. 365–374).
Zurück zum Zitat Dietz, L., Verma, M., Radlinski, F., & Craswell, N. (2017). TREC complex answer retrieval overview. In: Proceedings of TREC. Dietz, L., Verma, M., Radlinski, F., & Craswell, N. (2017). TREC complex answer retrieval overview. In: Proceedings of TREC.
Zurück zum Zitat Guo, J., Fan, Y., Ai, Q., & Croft, W. B. (2016). A deep relevance matching model for Ad-hoc retrieval. In: Proceedings of the 25th ACM international on conference on information and knowledge management (pp. 55–64). Guo, J., Fan, Y., Ai, Q., & Croft, W. B. (2016). A deep relevance matching model for Ad-hoc retrieval. In: Proceedings of the 25th ACM international on conference on information and knowledge management (pp. 55–64).
Zurück zum Zitat Heilman, J. M., & West, A. G. (2015). Wikipedia and medicine: quantifying readership, editors, and the significance of natural language. Journal of medical Internet research, 17(3), e62.CrossRef Heilman, J. M., & West, A. G. (2015). Wikipedia and medicine: quantifying readership, editors, and the significance of natural language. Journal of medical Internet research, 17(3), e62.CrossRef
Zurück zum Zitat Huang, P.-S., He, X., Gao, J., Deng, L., Acero, A., & Heck, L. (2013). Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the 22nd ACM international conference on conference on information & knowledge management (pp. 2333–2338). Huang, P.-S., He, X., Gao, J., Deng, L., Acero, A., & Heck, L. (2013). Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the 22nd ACM international conference on conference on information & knowledge management (pp. 2333–2338).
Zurück zum Zitat Hui, K., Yates, A., Berberich, K., & de Melo, G. (2017). PACRR: A position-aware neural IR model for relevance matching. In: Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 1049–1058). Hui, K., Yates, A., Berberich, K., & de Melo, G. (2017). PACRR: A position-aware neural IR model for relevance matching. In: Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 1049–1058).
Zurück zum Zitat Hui, K., Yates, A., Berberich, K., & de Melo, G. (2018). Co-PACRR: A context-aware neural IR model for ad-hoc retrieval. In: Proceedings of the eleventh ACM international conference on web search and data mining (pp. 279–287). Hui, K., Yates, A., Berberich, K., & de Melo, G. (2018). Co-PACRR: A context-aware neural IR model for ad-hoc retrieval. In: Proceedings of the eleventh ACM international conference on web search and data mining (pp. 279–287).
Zurück zum Zitat Levy, O., & Goldberg, Y. (2014). Dependency-based word embeddings. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (Volume 2: Short Papers) (vol 2, pp. 302–308). Levy, O., & Goldberg, Y. (2014). Dependency-based word embeddings. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (Volume 2: Short Papers) (vol 2, pp. 302–308).
Zurück zum Zitat Lin, X., & Lam, W. (2017), CUIS team for TREC 2017 CAR track. In: Proceedings of TREC. Lin, X., & Lam, W. (2017), CUIS team for TREC 2017 CAR track. In: Proceedings of TREC.
Zurück zum Zitat MacAvaney, S., Hui, K., & Yates, A. (2017a). An approach for weakly-supervised deep information retrieval. In: SIGIR 2017 workshop on neural information retrieval. MacAvaney, S., Hui, K., & Yates, A. (2017a). An approach for weakly-supervised deep information retrieval. In: SIGIR 2017 workshop on neural information retrieval.
Zurück zum Zitat MacAvaney, S., Yates, A., & Hui, K. (2017b). Contextualized PACRR for complex answer retrieval. In: Proceedings of TREC. . MacAvaney, S., Yates, A., & Hui, K. (2017b). Contextualized PACRR for complex answer retrieval. In: Proceedings of TREC. .
Zurück zum Zitat MacAvaney, S., Yates, A., Cohan, A., Soldaini, L., Hui, K., Goharian, N., & Frieder, O. (2018). Characterizing question facets for complex answer retrieval. In: Proceedings of the 41st international ACM SIGIR conference on research & development in information retrieval (pp. 1205–1208). MacAvaney, S., Yates, A., Cohan, A., Soldaini, L., Hui, K., Goharian, N., & Frieder, O. (2018). Characterizing question facets for complex answer retrieval. In: Proceedings of the 41st international ACM SIGIR conference on research & development in information retrieval (pp. 1205–1208).
Zurück zum Zitat Maldonado, R., Taylor, S., & Harabagiu, S. M. (2017). UTD HLTRI at TREC 2017: Complex answer retrieval track. In: Proceedings of TREC. Maldonado, R., Taylor, S., & Harabagiu, S. M. (2017). UTD HLTRI at TREC 2017: Complex answer retrieval track. In: Proceedings of TREC.
Zurück zum Zitat Metzler, D., & Croft, W. B. (2005). A markov random field model for term dependencies. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (pp. 472–479). Metzler, D., & Croft, W. B. (2005). A markov random field model for term dependencies. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (pp. 472–479).
Zurück zum Zitat Mitra, B., Diaz, F., & Craswell, N. (2017). Learning to match using local and distributed representations of text for web search. In: Proceedings of the 26th International Conference on World Wide Web (pp. 1291–1299). Mitra, B., Diaz, F., & Craswell, N. (2017). Learning to match using local and distributed representations of text for web search. In: Proceedings of the 26th International Conference on World Wide Web (pp. 1291–1299).
Zurück zum Zitat Nanni, F., Mitra, B., Magnusson, M., & Dietz, L. (2017). Benchmark for complex answer retrieval. In: Proceedings of the ACM SIGIR international conference on theory of information retrieval (pp. 293–296). Nanni, F., Mitra, B., Magnusson, M., & Dietz, L. (2017). Benchmark for complex answer retrieval. In: Proceedings of the ACM SIGIR international conference on theory of information retrieval (pp. 293–296).
Zurück zum Zitat Nickel, M., Rosasco, L., & Poggio, T. A. (2016). Holographic embeddings of knowledge graphs. In: Proceedings of the thirtieth AAAI conference on artificial intelligence (pp. 1955–1961). Nickel, M., Rosasco, L., & Poggio, T. A. (2016). Holographic embeddings of knowledge graphs. In: Proceedings of the thirtieth AAAI conference on artificial intelligence (pp. 1955–1961).
Zurück zum Zitat Nogueira, R., & Cho, K. (2017). Task-oriented query reformulation with reinforcement learning. In: Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 574–583). Nogueira, R., & Cho, K. (2017). Task-oriented query reformulation with reinforcement learning. In: Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 574–583).
Zurück zum Zitat Nogueira, R., Cho, K., Patel, U., & Chabot, V. (2017). New york university submission to TREC-CAR 2017. In: Proceedings of TREC. Nogueira, R., Cho, K., Patel, U., & Chabot, V. (2017). New york university submission to TREC-CAR 2017. In: Proceedings of TREC.
Zurück zum Zitat Pang, L., Lan, Y., Guo, J., Xu, J., & Cheng, X. (2016). 2016. A study of MatchPyramid models on ad-hoc retrieval. In: NeuIR at SIGIR. Pang, L., Lan, Y., Guo, J., Xu, J., & Cheng, X. (2016). 2016. A study of MatchPyramid models on ad-hoc retrieval. In: NeuIR at SIGIR.
Zurück zum Zitat Pang, L., Lan, Y., Guo, J., Xu, J., Xu, J., & Cheng, X. (2017). DeepRank: A new deep architecture for relevance ranking in information retrieval. In: Proceedings of the 2017 ACM on conference on information and knowledge management (pp. 257–266). Pang, L., Lan, Y., Guo, J., Xu, J., Xu, J., & Cheng, X. (2017). DeepRank: A new deep architecture for relevance ranking in information retrieval. In: Proceedings of the 2017 ACM on conference on information and knowledge management (pp. 257–266).
Zurück zum Zitat Sakai, T., & Kando, N. (2008). On information retrieval metrics designed for evaluation with incomplete relevance assessments. Information Retrieval, 11(5), 447–470.CrossRef Sakai, T., & Kando, N. (2008). On information retrieval metrics designed for evaluation with incomplete relevance assessments. Information Retrieval, 11(5), 447–470.CrossRef
Zurück zum Zitat Schuhmacher, M., Dietz, L., & Ponzetto, S. P. (2015). Ranking entities for web queries through text and knowledge. In: Proceedings of the 24th ACM international on conference on information and knowledge management (pp. 1461–1470). Schuhmacher, M., Dietz, L., & Ponzetto, S. P. (2015). Ranking entities for web queries through text and knowledge. In: Proceedings of the 24th ACM international on conference on information and knowledge management (pp. 1461–1470).
Zurück zum Zitat Singh, A. (2012). Entity based Q&A retrieval. In: Proceedings of the 2012 Joint conference on empirical methods in natural language processing and computational natural language learning (pp. 1266–1277. Singh, A. (2012). Entity based Q&A retrieval. In: Proceedings of the 2012 Joint conference on empirical methods in natural language processing and computational natural language learning (pp. 1266–1277.
Zurück zum Zitat Singh, S., Subramanya, A., Pereira, F., & McCallum, A. (2012). Wikilinks: A large-scale cross-document coreference corpus labeled via links to wikipedia. University of Massachusetts, Amherst, Technical Report UM-CS-2012 15. Singh, S., Subramanya, A., Pereira, F., & McCallum, A. (2012). Wikilinks: A large-scale cross-document coreference corpus labeled via links to wikipedia. University of Massachusetts, Amherst, Technical Report UM-CS-2012 15.
Zurück zum Zitat Wang, Z., Zhang, J., Feng, J., & Chen, Z. (2014). Knowledge graph embedding by translating on hyperplanes. In: Twenty-Eighth AAAI conference on artificial intelligence. Wang, Z., Zhang, J., Feng, J., & Chen, Z. (2014). Knowledge graph embedding by translating on hyperplanes. In: Twenty-Eighth AAAI conference on artificial intelligence.
Zurück zum Zitat Xiong, C., & Callan, J. (2015). Query expansion with Freebase. In: Proceedings of the 2015 international conference on the theory of information retrieval, ACM (pp. 111–120). Xiong, C., & Callan, J. (2015). Query expansion with Freebase. In: Proceedings of the 2015 international conference on the theory of information retrieval, ACM (pp. 111–120).
Zurück zum Zitat Xiong, C., Callan, J. P., & Liu, T. -Y. (2017). Word-entity duet representations for document ranking. In: Proceedings of the 40th International ACM SIGIR conference on research and development in information retrieval. Xiong, C., Callan, J. P., & Liu, T. -Y. (2017). Word-entity duet representations for document ranking. In: Proceedings of the 40th International ACM SIGIR conference on research and development in information retrieval.
Zurück zum Zitat Xiong, C., Dai, Z., Callan, J., Liu, Z., & Power, R. (2017). End-to-end neural ad-hoc ranking with kernel pooling. In: Proceedings of the 40th International ACM SIGIR conference on research and development in information retrieval, ACM (pp. 55–64). Xiong, C., Dai, Z., Callan, J., Liu, Z., & Power, R. (2017). End-to-end neural ad-hoc ranking with kernel pooling. In: Proceedings of the 40th International ACM SIGIR conference on research and development in information retrieval, ACM (pp. 55–64).
Zurück zum Zitat Yih, W.-t., Chang, M.-W., He, X., & Gao, J. (2015). Semantic parsing via staged query graph generation: Question answering with knowledge base. In: Proceedings of the 53rd annual meeting of the association for computational linguistics (pp. 1321–1331). Yih, W.-t., Chang, M.-W., He, X., & Gao, J. (2015). Semantic parsing via staged query graph generation: Question answering with knowledge base. In: Proceedings of the 53rd annual meeting of the association for computational linguistics (pp. 1321–1331).
Zurück zum Zitat Zamani, H., Mitra, B., Song, X., Craswell, N., & Tiwary, S. (2018). Neural ranking models with multiple document fields. In: Proceedings of the eleventh ACM international conference on web search and data mining (pp. 700–708). Zamani, H., Mitra, B., Song, X., Craswell, N., & Tiwary, S. (2018). Neural ranking models with multiple document fields. In: Proceedings of the eleventh ACM international conference on web search and data mining (pp. 700–708).
Metadaten
Titel
Overcoming low-utility facets for complex answer retrieval
verfasst von
Sean MacAvaney
Andrew Yates
Arman Cohan
Luca Soldaini
Kai Hui
Nazli Goharian
Ophir Frieder
Publikationsdatum
24.10.2018
Verlag
Springer Netherlands
Erschienen in
Discover Computing / Ausgabe 3-4/2019
Print ISSN: 2948-2984
Elektronische ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-018-9343-0

Weitere Artikel der Ausgabe 3-4/2019

Discover Computing 3-4/2019 Zur Ausgabe

Knowledge Graphs and Semantics in Text Analysis and Retrieval

Neural architecture for question answering using a knowledge graph and web corpus

Knowledge Graphs and Semantics in Text Analysis and Retrieval

Identifying and exploiting target entity type information for ad hoc entity retrieval

Knowledge Graphs and Semantics in Text Analysis and Retrieval

Special issue on knowledge graphs and semantics in text analysis and retrieval

Knowledge Graphs and Semantics in Text Analysis and Retrieval

Neural variational entity set expansion for automatically populated knowledge graphs

Knowledge Graphs and Semantics in Text Analysis and Retrieval

Automated assessment of knowledge hierarchy evolution: comparing directed acyclic graphs

Knowledge Graphs and Semantics in Text Analysis and Retrieval

Payoffs and pitfalls in using knowledge-bases for consumer health search