Skip to main content

2017 | OriginalPaper | Buchkapitel

An Unsupervised Approach for Low-Quality Answer Detection in Community Question-Answering

verfasst von : Haocheng Wu, Zuohui Tian, Wei Wu, Enhong Chen

Erschienen in: Database Systems for Advanced Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Community Question Answering (CQA) sites such as Yahoo! Answers provide rich knowledge for people to access. However, the quality of answers posted to CQA sites often varies a lot from precise and useful ones to irrelevant and useless ones. Hence, automatic detection of low-quality answers will help the site managers efficiently organize the accumulated knowledge and provide high-quality contents to users. In this paper, we propose a novel unsupervised approach to detect low-quality answers at a CQA site. The key ideas in our model are: (1) most answers are normal; (2) low-quality answers can be found by checking its “peer” answers under the same question; (3) different questions have different answer quality criteria. Based on these ideas, we devise an unsupervised learning algorithm to assign soft labels to answers as quality scores. Experiments show that our model significantly outperforms the other state-of-the-art models on answer quality prediction.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
2
We set \(\epsilon =0.00001\) and \(N=200\) in our experiments.
 
4
Features of Q_u_other and A_u_other in Table 2 are only traceable in Yahoo dataset.
 
14
c: trade-off between training error and margin. j: cost-factor of training errors difference between positive and negative examples. b: use biased hyperplane or not.
 
15
To save space we only report the results on Qatar dataset. The results in terms of Fatwa and Yahoo have similar trends.
 
16
“Non-English” and “Other” answers are categorized into “Irrelevant” answers.
 
Literatur
1.
Zurück zum Zitat Berger, A., et al.: Bridging the lexical chasm: statistical approaches to answer-finding. In: SIGIR 2000 (2000) Berger, A., et al.: Bridging the lexical chasm: statistical approaches to answer-finding. In: SIGIR 2000 (2000)
2.
Zurück zum Zitat Blei, D.M., et al.: Latent Dirichlet allocation. In: NIPS 2001 (2001) Blei, D.M., et al.: Latent Dirichlet allocation. In: NIPS 2001 (2001)
3.
Zurück zum Zitat Chandola, V., et al.: Anomaly detection: a survey. ACM Comput. Surv. 41(3) (2009) Chandola, V., et al.: Anomaly detection: a survey. ACM Comput. Surv. 41(3) (2009)
4.
Zurück zum Zitat Crawford, M., et al.: Survey of review spam detection using machine learning techniques. J. Big Data 2(1), 23 (2015)CrossRef Crawford, M., et al.: Survey of review spam detection using machine learning techniques. J. Big Data 2(1), 23 (2015)CrossRef
5.
Zurück zum Zitat Denkowski, M.J., Lavie, A.: Meteor universal: language specific translation evaluation for any target language. In: EACL 2014 (2014) Denkowski, M.J., Lavie, A.: Meteor universal: language specific translation evaluation for any target language. In: EACL 2014 (2014)
6.
Zurück zum Zitat Hodge, V.J., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)CrossRefMATH Hodge, V.J., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)CrossRefMATH
7.
Zurück zum Zitat Jeon, J., et al.: A framework to predict the quality of answers with non-textual features. In: SIGIR 2006 (2006) Jeon, J., et al.: A framework to predict the quality of answers with non-textual features. In: SIGIR 2006 (2006)
8.
Zurück zum Zitat Jindal, N., Liu, B.: Review spam detection. In: WWW 2007, pp. 1189–1190 (2007) Jindal, N., Liu, B.: Review spam detection. In: WWW 2007, pp. 1189–1190 (2007)
9.
Zurück zum Zitat Joachims, T.: Learning to Classify Text Using Support Vector Machines - Methods, Theory, and Algorithms. Kluwer/Springer, New York (2002)CrossRef Joachims, T.: Learning to Classify Text Using Support Vector Machines - Methods, Theory, and Algorithms. Kluwer/Springer, New York (2002)CrossRef
10.
Zurück zum Zitat Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: ACL 2003 (2003) Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: ACL 2003 (2003)
11.
Zurück zum Zitat Li, F., et al.: Learning to identify review spam. In: IJCAI 2011 (2011) Li, F., et al.: Learning to identify review spam. In: IJCAI 2011 (2011)
12.
Zurück zum Zitat Liu, W., et al.: Unsupervised one-class learning for automatic outlier removal. In: CVPR 2014 (2014) Liu, W., et al.: Unsupervised one-class learning for automatic outlier removal. In: CVPR 2014 (2014)
13.
Zurück zum Zitat Lyon, C., et al.: Detecting short passages of similar text in large document collections. In: EMNLP 2001, pp. 118–125 (2001) Lyon, C., et al.: Detecting short passages of similar text in large document collections. In: EMNLP 2001, pp. 118–125 (2001)
14.
Zurück zum Zitat Mikolov, T., et al.: Efficient estimation of word representations in vector space. CoRR, abs/1301.3781 (2013) Mikolov, T., et al.: Efficient estimation of word representations in vector space. CoRR, abs/1301.3781 (2013)
15.
Zurück zum Zitat Nakov, P., et al.: Semeval-2015 task 3: answer selection in community question answering. In: SemEval@NAACL-HLT 2015 (2015) Nakov, P., et al.: Semeval-2015 task 3: answer selection in community question answering. In: SemEval@NAACL-HLT 2015 (2015)
16.
Zurück zum Zitat Nakov, P., et al.: Semeval-2016 task 3: community question answering. In: SemEval@NAACL-HLT 2016, pp. 525–545 (2016) Nakov, P., et al.: Semeval-2016 task 3: community question answering. In: SemEval@NAACL-HLT 2016, pp. 525–545 (2016)
17.
Zurück zum Zitat Nallapati, R.: Discriminative models for information retrieval. In: SIGIR 2004 (2004) Nallapati, R.: Discriminative models for information retrieval. In: SIGIR 2004 (2004)
18.
Zurück zum Zitat Nicosia, M.Q., et al.: QCRI: answer selection for community question answering - experiments for arabic and english. In: SemEval@NAACL-HLT 2015 (2015) Nicosia, M.Q., et al.: QCRI: answer selection for community question answering - experiments for arabic and english. In: SemEval@NAACL-HLT 2015 (2015)
19.
Zurück zum Zitat Radev, D.R., et al.: Evaluating web-based question answering systems. In: LREC’s 2002 (2002) Radev, D.R., et al.: Evaluating web-based question answering systems. In: LREC’s 2002 (2002)
20.
Zurück zum Zitat Sakai, T., et al.: Using graded-relevance metrics for evaluating community QA answer selection. In: WSDM 2011 (2011) Sakai, T., et al.: Using graded-relevance metrics for evaluating community QA answer selection. In: WSDM 2011 (2011)
21.
Zurück zum Zitat Shah, C., Pomerantz, J.: Evaluating and predicting answer quality in community QA. In: SIGIR 2010 (2010) Shah, C., Pomerantz, J.: Evaluating and predicting answer quality in community QA. In: SIGIR 2010 (2010)
22.
Zurück zum Zitat Toutanova, K., et al.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: HLT-NAACL (2003) Toutanova, K., et al.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: HLT-NAACL (2003)
23.
Zurück zum Zitat Tran, Q.H., et al.: JAIST: combining multiple features for answer selection in community question answering. In: SemEval@NAACL-HLT 2015 (2015) Tran, Q.H., et al.: JAIST: combining multiple features for answer selection in community question answering. In: SemEval@NAACL-HLT 2015 (2015)
24.
Zurück zum Zitat Wise, M.J.: YAP3: improved detection of similarities in computer program and other texts. In: SIGCSE 1996, pp. 130–134 (1996) Wise, M.J.: YAP3: improved detection of similarities in computer program and other texts. In: SIGCSE 1996, pp. 130–134 (1996)
25.
Zurück zum Zitat Xia, Y., et al.: Learning discriminative reconstructions for unsupervised outlier removal. In: ICCV 2015 (2015) Xia, Y., et al.: Learning discriminative reconstructions for unsupervised outlier removal. In: ICCV 2015 (2015)
Metadaten
Titel
An Unsupervised Approach for Low-Quality Answer Detection in Community Question-Answering
verfasst von
Haocheng Wu
Zuohui Tian
Wei Wu
Enhong Chen
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-55699-4_6