nach oben

Discover Computing

Erschienen in:

24.10.2018 | Knowledge Graphs and Semantics in Text Analysis and Retrieval

Overcoming low-utility facets for complex answer retrieval

verfasst von: Sean MacAvaney, Andrew Yates, Arman Cohan, Luca Soldaini, Kai Hui, Nazli Goharian, Ophir Frieder

Erschienen in: Discover Computing | Ausgabe 3-4/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Many questions cannot be answered simply; their answers must include numerous nuanced details and context. Complex Answer Retrieval (CAR) is the retrieval of answers to such questions. These questions can be constructed from a topic entity (e.g., ‘cheese’) and a facet (e.g., ‘health effects’). While topic matching has been thoroughly explored, we observe that some facets use general language that is unlikely to appear verbatim in answers, exhibiting low utility. In this work, we present an approach to CAR that identifies and addresses low-utility facets. First, we propose two estimators of facet utility: the hierarchical structure of CAR queries, and facet frequency information from training data. Then, to improve the retrieval performance on low-utility headings, we include entity similarity scores using embeddings trained from a CAR knowledge graph, which captures the context of facets. We show that our methods are effective by applying them to two leading neural ranking techniques, and evaluating them on the TREC CAR dataset. We find that our approach perform significantly better than the unmodified neural ranker and other leading CAR techniques, yielding state-of-the-art results. We also provide a detailed analysis of our results, verify that low-utility facets are indeed difficult to match, and that our approach improves the performance for these difficult queries.

Vorheriger Artikel Payoffs and pitfalls in using knowledge-bases for consumer health search

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Note that CAR queries are not necessarily complex. A question as simple as ‘Is cheese healthy?’ requires a complex answer: a detailed and nuanced description of positive and negative health effects of cheese consumption is required to satisfy the information need. In contrast, a question such as ‘How much Mozzarella cheese do I need to eat to satisfy my daily requirement of calcium?’ is a complex question with a simple factoid answer because it involves advanced reasoning that goes beyond what is typically captured by a knowledge graph.

E.g., templates, talk pages, portals, lists, references, and pages representing people, organizations, music, books, and others are discarded (Dietz et al. 2017).

We use the symbol »to separate heading components of a query.

For query Q and document D, the similarity matrix S of size \(|Q|\times |D|\) is be computed by calculating the similarity (e.g., cosine) between the representations (e.g., word embeddings) of each query term and document term, i.e., \(S[i,j]=sim(Q_i,D_j)\). A similarity matrix allows for query terms to be soft-matched to document terms.

For reference, the 60th percentile is approximately the cutoff for headings that only appear a few times such as Red Hot Chili Peppers; the 90th percentile is approximately the cutoff of moderately frequent headings such as Finland; and the 99th percentile is approximately the cutoff of frequent headings such as Family and personal life.

We also cannot remove the evaluation topics when training the graph because that defeats the purpose; without target entities encoded in the embeddings, there is no way to find similar entities when ranking.

https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Linking.

We acknowledge that some paragraphs included as negative training samples, if inspected manually, would be found relevant due to the limitations of the automatic relevance judgments. We deem this as okay, considering the high occurrence of non-relevant documents in the manual relevance judgments, and the comparatively poor performance of BM25 at CAR.

On average, there are 42 manual relevance judgments for the 702 queries that were manually assessed.

Unjudged evaluation is unavailable for the sequential dependency model and Siamese attention network.

We observed similar behavior for MAP, R-Prec, MRR, and nDCG, so we only report MAP here.

Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., & Ives, Z. (2007). DBpedia: A nucleus for a web of open data. In: The semantic web (pp. 722–735).

Bollacker, K. D., Evans, C., Paritosh, P., Sturge, T., & Taylor, J. (2008). Freebase: A collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data (pp. 1247–1250).

Bordes, A., Usunier, N., García-Durán, A., Weston, J., & Yakhnenko, O. (2013). Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems (pp. 2787–2795.

Dai, Z., Xiong, C., Callan, J. P., & Liu, Z. (2018). Convolutional neural networks for soft-matching n-grams in ad-hoc search. In: Proceedings of the eleventh ACM international conference on web search and data mining (pp. 126–134).

Daiber, J., Jakob, M., Hokamp, C., & Mendes, P. N. (2013). Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th international conference on semantic systems (pp. 121–124).

Dalton, J., Dietz, L., & Allan, J. (2014). Entity query feature expansion using knowledge base links. In: Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval, ACM (pp. 365–374).

Dietz, L., & Gamari, B. (2017). TREC CAR: A data set for complex answer retrieval (version 1.5). http://trec-car.cs.unh.edu. Accessed 2 May 2018.

Dietz, L., Verma, M., Radlinski, F., & Craswell, N. (2017). TREC complex answer retrieval overview. In: Proceedings of TREC.

Guo, J., Fan, Y., Ai, Q., & Croft, W. B. (2016). A deep relevance matching model for Ad-hoc retrieval. In: Proceedings of the 25th ACM international on conference on information and knowledge management (pp. 55–64).

Heilman, J. M., & West, A. G. (2015). Wikipedia and medicine: quantifying readership, editors, and the significance of natural language. Journal of medical Internet research, 17(3), e62.CrossRef

Huang, P.-S., He, X., Gao, J., Deng, L., Acero, A., & Heck, L. (2013). Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the 22nd ACM international conference on conference on information & knowledge management (pp. 2333–2338).

Hui, K., Yates, A., Berberich, K., & de Melo, G. (2017). PACRR: A position-aware neural IR model for relevance matching. In: Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 1049–1058).

Hui, K., Yates, A., Berberich, K., & de Melo, G. (2018). Co-PACRR: A context-aware neural IR model for ad-hoc retrieval. In: Proceedings of the eleventh ACM international conference on web search and data mining (pp. 279–287).

Levy, O., & Goldberg, Y. (2014). Dependency-based word embeddings. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (Volume 2: Short Papers) (vol 2, pp. 302–308).

Lin, X., & Lam, W. (2017), CUIS team for TREC 2017 CAR track. In: Proceedings of TREC.

MacAvaney, S., Hui, K., & Yates, A. (2017a). An approach for weakly-supervised deep information retrieval. In: SIGIR 2017 workshop on neural information retrieval.

MacAvaney, S., Yates, A., & Hui, K. (2017b). Contextualized PACRR for complex answer retrieval. In: Proceedings of TREC. .

MacAvaney, S., Yates, A., Cohan, A., Soldaini, L., Hui, K., Goharian, N., & Frieder, O. (2018). Characterizing question facets for complex answer retrieval. In: Proceedings of the 41st international ACM SIGIR conference on research & development in information retrieval (pp. 1205–1208).

Maldonado, R., Taylor, S., & Harabagiu, S. M. (2017). UTD HLTRI at TREC 2017: Complex answer retrieval track. In: Proceedings of TREC.

Metzler, D., & Croft, W. B. (2005). A markov random field model for term dependencies. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (pp. 472–479).

Mitra, B., Diaz, F., & Craswell, N. (2017). Learning to match using local and distributed representations of text for web search. In: Proceedings of the 26th International Conference on World Wide Web (pp. 1291–1299).

Nanni, F., Mitra, B., Magnusson, M., & Dietz, L. (2017). Benchmark for complex answer retrieval. In: Proceedings of the ACM SIGIR international conference on theory of information retrieval (pp. 293–296).

Nickel, M., Rosasco, L., & Poggio, T. A. (2016). Holographic embeddings of knowledge graphs. In: Proceedings of the thirtieth AAAI conference on artificial intelligence (pp. 1955–1961).

Nogueira, R., & Cho, K. (2017). Task-oriented query reformulation with reinforcement learning. In: Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 574–583).

Nogueira, R., Cho, K., Patel, U., & Chabot, V. (2017). New york university submission to TREC-CAR 2017. In: Proceedings of TREC.

Pang, L., Lan, Y., Guo, J., Xu, J., & Cheng, X. (2016). 2016. A study of MatchPyramid models on ad-hoc retrieval. In: NeuIR at SIGIR.

Pang, L., Lan, Y., Guo, J., Xu, J., Xu, J., & Cheng, X. (2017). DeepRank: A new deep architecture for relevance ranking in information retrieval. In: Proceedings of the 2017 ACM on conference on information and knowledge management (pp. 257–266).

Sakai, T., & Kando, N. (2008). On information retrieval metrics designed for evaluation with incomplete relevance assessments. Information Retrieval, 11(5), 447–470.CrossRef

Schuhmacher, M., Dietz, L., & Ponzetto, S. P. (2015). Ranking entities for web queries through text and knowledge. In: Proceedings of the 24th ACM international on conference on information and knowledge management (pp. 1461–1470).

Singh, A. (2012). Entity based Q&A retrieval. In: Proceedings of the 2012 Joint conference on empirical methods in natural language processing and computational natural language learning (pp. 1266–1277.

Singh, S., Subramanya, A., Pereira, F., & McCallum, A. (2012). Wikilinks: A large-scale cross-document coreference corpus labeled via links to wikipedia. University of Massachusetts, Amherst, Technical Report UM-CS-2012 15.

Wang, Z., Zhang, J., Feng, J., & Chen, Z. (2014). Knowledge graph embedding by translating on hyperplanes. In: Twenty-Eighth AAAI conference on artificial intelligence.

Xiong, C., & Callan, J. (2015). Query expansion with Freebase. In: Proceedings of the 2015 international conference on the theory of information retrieval, ACM (pp. 111–120).

Xiong, C., Callan, J. P., & Liu, T. -Y. (2017). Word-entity duet representations for document ranking. In: Proceedings of the 40th International ACM SIGIR conference on research and development in information retrieval.

Xiong, C., Dai, Z., Callan, J., Liu, Z., & Power, R. (2017). End-to-end neural ad-hoc ranking with kernel pooling. In: Proceedings of the 40th International ACM SIGIR conference on research and development in information retrieval, ACM (pp. 55–64).

Yih, W.-t., Chang, M.-W., He, X., & Gao, J. (2015). Semantic parsing via staged query graph generation: Question answering with knowledge base. In: Proceedings of the 53rd annual meeting of the association for computational linguistics (pp. 1321–1331).

Zamani, H., Mitra, B., Song, X., Craswell, N., & Tiwary, S. (2018). Neural ranking models with multiple document fields. In: Proceedings of the eleventh ACM international conference on web search and data mining (pp. 700–708).

Titel: Overcoming low-utility facets for complex answer retrieval
verfasst von: Sean MacAvaney
Andrew Yates
Arman Cohan
Luca Soldaini
Kai Hui
Nazli Goharian
Ophir Frieder
Publikationsdatum: 24.10.2018
Verlag: Springer Netherlands
Erschienen in: Discover Computing / Ausgabe 3-4/2019
Print ISSN: 2948-2984
Elektronische ISSN: 2948-2992
DOI: https://doi.org/10.1007/s10791-018-9343-0

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3-4/2019

Neural architecture for question answering using a knowledge graph and web corpus

Identifying and exploiting target entity type information for ad hoc entity retrieval

Special issue on knowledge graphs and semantics in text analysis and retrieval

Neural variational entity set expansion for automatically populated knowledge graphs

Automated assessment of knowledge hierarchy evolution: comparing directed acyclic graphs

Payoffs and pitfalls in using knowledge-bases for consumer health search