Skip to main content

2016 | OriginalPaper | Buchkapitel

Retrieving and Ranking Similar Questions from Question-Answer Archives Using Topic Modelling and Topic Distribution Regression

verfasst von : Pedro Chahuara, Thomas Lampert, Pierre Gançarski

Erschienen in: Research and Advanced Technology for Digital Libraries

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Presented herein is a novel model for similar question ranking within collaborative question answer platforms. The presented approach integrates a regression stage to relate topics derived from questions to those derived from question-answer pairs. This helps to avoid problems caused by the differences in vocabulary used within questions and answers, and the tendency for questions to be shorter than answers. The performance of the model is shown to outperform translation methods and topic modelling (without regression) on several real-world datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
3
Mistakes in the questions are original to the data.
 
Literatur
1.
Zurück zum Zitat Jeon, J., Croft, B.W., Ho Lee, J.: Finding similar questions in large question and answer archives. In: CIKM, pp. 84–90 (2005) Jeon, J., Croft, B.W., Ho Lee, J.: Finding similar questions in large question and answer archives. In: CIKM, pp. 84–90 (2005)
2.
Zurück zum Zitat Zhang, W.N., et al.: A topic clustering approach to finding similar questions from large question and answer archives. PLoS ONE 9, e71511 (2014)CrossRef Zhang, W.N., et al.: A topic clustering approach to finding similar questions from large question and answer archives. PLoS ONE 9, e71511 (2014)CrossRef
3.
Zurück zum Zitat Wang, K., Ming, Z., Chua, T.S.: A syntactic tree matching approach to finding similar questions in community-based QA services. In: SIGIR, pp. 187–194 (2009) Wang, K., Ming, Z., Chua, T.S.: A syntactic tree matching approach to finding similar questions in community-based QA services. In: SIGIR, pp. 187–194 (2009)
4.
Zurück zum Zitat Cui, H., Sun, R., Li, K., Kan, M.Y., Chua, T.S.: Question answering passage retrieval using dependency relations. In: SIGIR, pp. 400–407 (2005) Cui, H., Sun, R., Li, K., Kan, M.Y., Chua, T.S.: Question answering passage retrieval using dependency relations. In: SIGIR, pp. 400–407 (2005)
5.
Zurück zum Zitat Xue, X., Jeon, J., Croft, W.B.: Retrieval models for question and answer archives. In: SIGIR, pp. 475–482 (2008) Xue, X., Jeon, J., Croft, W.B.: Retrieval models for question and answer archives. In: SIGIR, pp. 475–482 (2008)
6.
Zurück zum Zitat Lee, J.T., et al.: Bridging lexical gaps between queries and questions on large online Q&A collections with compact translation models. In: EMNLP, pp. 410–418 (2008) Lee, J.T., et al.: Bridging lexical gaps between queries and questions on large online Q&A collections with compact translation models. In: EMNLP, pp. 410–418 (2008)
7.
Zurück zum Zitat Bernhard, D., Gurevych, I.: Combining lexical semantic resources with question & answer archives for translation-based answer finding. In: ACL-IJCNLP, vol. 2, pp. 728–736 (2009) Bernhard, D., Gurevych, I.: Combining lexical semantic resources with question & answer archives for translation-based answer finding. In: ACL-IJCNLP, vol. 2, pp. 728–736 (2009)
8.
Zurück zum Zitat Yang, L., et al.: CQArank: jointly model topics and expertise in community question answering. In: CIKM, pp. 99–108 (2013) Yang, L., et al.: CQArank: jointly model topics and expertise in community question answering. In: CIKM, pp. 99–108 (2013)
9.
Zurück zum Zitat Berger, A., Caruana, R., Cohn, D., Freitag, D., Mittal, V.: Bridging the lexical chasm: statistical approaches to answer-finding. In: SIGIR, pp. 192–199 (2000) Berger, A., Caruana, R., Cohn, D., Freitag, D., Mittal, V.: Bridging the lexical chasm: statistical approaches to answer-finding. In: SIGIR, pp. 192–199 (2000)
10.
Zurück zum Zitat Cai, L., Zhou, G., Liu, K., Zhao, J.: Learning the latent topics for question retrieval in community QA. In: IJCNLP, pp. 273–281 (2011) Cai, L., Zhou, G., Liu, K., Zhao, J.: Learning the latent topics for question retrieval in community QA. In: IJCNLP, pp. 273–281 (2011)
11.
Zurück zum Zitat Vasiljević, J., Ivanović, M., Lampert, T.: The application of the topic modeling to question answer retrieval. In: ICIST, pp. 241–246 (2016) Vasiljević, J., Ivanović, M., Lampert, T.: The application of the topic modeling to question answer retrieval. In: ICIST, pp. 241–246 (2016)
12.
Zurück zum Zitat Robertson, S.E., Walker, S.: Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: SIGIR, pp. 232–241 (1994) Robertson, S.E., Walker, S.: Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: SIGIR, pp. 232–241 (1994)
13.
Zurück zum Zitat Brown, P., et al.: The mathematics of statistical machine translation: paramter estimation. Comput. Linguist. 19, 263–311 (1993) Brown, P., et al.: The mathematics of statistical machine translation: paramter estimation. Comput. Linguist. 19, 263–311 (1993)
14.
Zurück zum Zitat Zhou, G., et al.: Improving question retrieval in community question answering using world knowledge. In: IJCAI, pp. 2239–2245 (2013) Zhou, G., et al.: Improving question retrieval in community question answering using world knowledge. In: IJCAI, pp. 2239–2245 (2013)
15.
Zurück zum Zitat Singh, A.: Entity based Q&A retrieval. In: EMNLP, pp. 1266–1277 (2012) Singh, A.: Entity based Q&A retrieval. In: EMNLP, pp. 1266–1277 (2012)
16.
Zurück zum Zitat Zhou, G., et al.: Statistical machine translation improves question retrieval in community question answering via matrix factorization. In: ACL, pp. 852–861 (2013) Zhou, G., et al.: Statistical machine translation improves question retrieval in community question answering via matrix factorization. In: ACL, pp. 852–861 (2013)
17.
Zurück zum Zitat Blei, D.M., et al.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH Blei, D.M., et al.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
18.
Zurück zum Zitat Zolaktaf, Z., Riahi, F., Shafiei, M., Milios, E.: Modeling community question-answering archives. In: Proceedings of the Workshop on Computational Social Science and the Wisdom of Crowds at NIPS (2011) Zolaktaf, Z., Riahi, F., Shafiei, M., Milios, E.: Modeling community question-answering archives. In: Proceedings of the Workshop on Computational Social Science and the Wisdom of Crowds at NIPS (2011)
19.
Zurück zum Zitat Petterson, J., et al.: Word features for latent dirichlet allocation. In: NIPS, vol. 23, pp. 1921–1929 (2010) Petterson, J., et al.: Word features for latent dirichlet allocation. In: NIPS, vol. 23, pp. 1921–1929 (2010)
20.
Zurück zum Zitat Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR Workshop (2013) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR Workshop (2013)
21.
Zurück zum Zitat Yao, L., Mimno, D., McCallum, A.: Efficient methods for topic model inference on streaming document collections. In: SIGKDD, pp. 937–946 (2009) Yao, L., Mimno, D., McCallum, A.: Efficient methods for topic model inference on streaming document collections. In: SIGKDD, pp. 937–946 (2009)
22.
Zurück zum Zitat Griffiths, T., Steyvers, M.: Finding scientific topics. PNAS 101, 5228–5235 (2004)CrossRef Griffiths, T., Steyvers, M.: Finding scientific topics. PNAS 101, 5228–5235 (2004)CrossRef
23.
Zurück zum Zitat Wallach, H., Murray, I., Salakhutdinov, R., Mimno, D.: Evaluation methods for topic models. In: ICML, pp. 1105–1112 (2009) Wallach, H., Murray, I., Salakhutdinov, R., Mimno, D.: Evaluation methods for topic models. In: ICML, pp. 1105–1112 (2009)
24.
Zurück zum Zitat Ripley, B.: Pattern Recognition and Neural Networks. Cambridge University Press, London (1996)CrossRefMATH Ripley, B.: Pattern Recognition and Neural Networks. Cambridge University Press, London (1996)CrossRefMATH
25.
Zurück zum Zitat Bentz, Y., Merunka, D.: Neural networks and the multinomial logit for brand choice modelling: a hybrid approach. J. Forecast. 19, 177–200 (2000)CrossRef Bentz, Y., Merunka, D.: Neural networks and the multinomial logit for brand choice modelling: a hybrid approach. J. Forecast. 19, 177–200 (2000)CrossRef
26.
Zurück zum Zitat Socher, R., et al.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: NIPS (2011) Socher, R., et al.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: NIPS (2011)
Metadaten
Titel
Retrieving and Ranking Similar Questions from Question-Answer Archives Using Topic Modelling and Topic Distribution Regression
verfasst von
Pedro Chahuara
Thomas Lampert
Pierre Gançarski
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-43997-6_4