Skip to main content
Erschienen in: Soft Computing 5/2014

01.05.2014 | Methodologies and Application

A novel machine learning approach to rank web forum posts

verfasst von: Xiaohui Han, Jun Ma, Yun Wu, Chaoran Cui

Erschienen in: Soft Computing | Ausgabe 5/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Since the user generated contents in Web forums are rich but vary in quality, ranging from excellent detailed opinions to simple repetition of the content of previous, or even spams, it is difficult to find high quality information in the process of post browsing, retrieval and other Web forum applications. In this paper, we propose a novel machine learning approach named LGPRank to evaluate the web forum posts, where a genetic programming architecture is used to rank Web forum posts according to the qualities of their contents. In order to address the shortcomings of current studies, we take both the semantic-free and semantic-specific information of a post into account. We propose a set of new features named Latent Dirichlet Allocation (LDA) semantic features which are computed in LDA topic space. The proposed features as well as content surface features and forum specific features are used in the learning process. Experiments are conducted on three web forum datasets in comparison with methods used in prior ranking research. LGPRank outperforms all the other methods in terms of P@N, NDCG@N and MAP measures. Furthermore, the experimental results also indicate that the proposed LDA semantic features have a positive effect in improving the ranking performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Agarwal A, Raghavan H, Subbian K, Melville P, Lawrence RD, Gondek DC, Fan J (2012) Learning to rank for robust question answering. In: Proceedings of the 21st ACM international conference on Information and knowledge management, ACM, New York, CIKM ’12, pp 833–842 Agarwal A, Raghavan H, Subbian K, Melville P, Lawrence RD, Gondek DC, Fan J (2012) Learning to rank for robust question answering. In: Proceedings of the 21st ACM international conference on Information and knowledge management, ACM, New York, CIKM ’12, pp 833–842
Zurück zum Zitat Baeza-Yates RA, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley, Menlo Park Baeza-Yates RA, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley, Menlo Park
Zurück zum Zitat Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022MATH Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022MATH
Zurück zum Zitat Burges C, Shaked T, Renshaw E, Lazier A, Deeds M, Hamilton N, Hullender G (2005) Learning to rank using gradient descent. In: Proceedings of the 22nd international conference on, Machine learning, pp 89–96 Burges C, Shaked T, Renshaw E, Lazier A, Deeds M, Hamilton N, Hullender G (2005) Learning to rank using gradient descent. In: Proceedings of the 22nd international conference on, Machine learning, pp 89–96
Zurück zum Zitat Chen CC, Tseng YD (2011) Quality evaluation of product reviews using an information quality framework. Decision Support Syst 50(4):755–768CrossRef Chen CC, Tseng YD (2011) Quality evaluation of product reviews using an information quality framework. Decision Support Syst 50(4):755–768CrossRef
Zurück zum Zitat Chen L, Nayak R (2012) Leveraging the network information for evaluating answer quality in a collaborative question answering portal. Social Netw Anal Mining 2(3):197–215CrossRef Chen L, Nayak R (2012) Leveraging the network information for evaluating answer quality in a collaborative question answering portal. Social Netw Anal Mining 2(3):197–215CrossRef
Zurück zum Zitat Chen Z, Zhang L, Wang W (2008) Postingrank: bringing order to web forum postings. In: Li H, Liu T, Ma WY, Sakai T, Wong KF, Zhou G (eds) Information retrieval technology. Lecture notes in computer science, vol 4993. Springer, Berlin, pp 377–384 Chen Z, Zhang L, Wang W (2008) Postingrank: bringing order to web forum postings. In: Li H, Liu T, Ma WY, Sakai T, Wong KF, Zhou G (eds) Information retrieval technology. Lecture notes in computer science, vol 4993. Springer, Berlin, pp 377–384
Zurück zum Zitat Fan W, Gordon MD, Pathak P (2004) Discovery of context-specific ranking functions for effective information retrieval using genetic programming. IEEE Trans Knowl Data Eng 16(4):523–527CrossRef Fan W, Gordon MD, Pathak P (2004) Discovery of context-specific ranking functions for effective information retrieval using genetic programming. IEEE Trans Knowl Data Eng 16(4):523–527CrossRef
Zurück zum Zitat Freund Y, Iyer R, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969MathSciNet Freund Y, Iyer R, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969MathSciNet
Zurück zum Zitat Ghose A, Ipeirotis PG (2011) Estimating the helpfulness and economic impact of product reviews: mining text and reviewer characteristics. IEEE Trans Knowl Data Eng 23(10):1498–1512CrossRef Ghose A, Ipeirotis PG (2011) Estimating the helpfulness and economic impact of product reviews: mining text and reviewer characteristics. IEEE Trans Knowl Data Eng 23(10):1498–1512CrossRef
Zurück zum Zitat Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(Suppl 1):5228–5235 Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(Suppl 1):5228–5235
Zurück zum Zitat Herbrich R, Graepel T, Obermayer K (2000) Large margin rank boundaries for ordinal regression. In: Smola A, Bartlett P, Schölkopf B, Schuurmans D (eds) Advances in large margin classifiers. MIT Press, Cambridge, pp 115–132 Herbrich R, Graepel T, Obermayer K (2000) Large margin rank boundaries for ordinal regression. In: Smola A, Bartlett P, Schölkopf B, Schuurmans D (eds) Advances in large margin classifiers. MIT Press, Cambridge, pp 115–132
Zurück zum Zitat Hong Y, Lu J, Yao J, Zhu Q, Zhou G (2012) What reviews are satisfactory: novel features for automatic helpfulness voting. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval, SIGIR ’12. ACM, New York, pp 495–504 Hong Y, Lu J, Yao J, Zhu Q, Zhou G (2012) What reviews are satisfactory: novel features for automatic helpfulness voting. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval, SIGIR ’12. ACM, New York, pp 495–504
Zurück zum Zitat Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of ir techniques. ACM Trans Inf Syst 20(4):422–446CrossRef Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of ir techniques. ACM Trans Inf Syst 20(4):422–446CrossRef
Zurück zum Zitat John B, Chua A, Goh DHL (2011) What makes a high-quality user-generated answer? IEEE Internet Computing 15(1):66–71CrossRef John B, Chua A, Goh DHL (2011) What makes a high-quality user-generated answer? IEEE Internet Computing 15(1):66–71CrossRef
Zurück zum Zitat Koza J Jr (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, CambridgeMATH Koza J Jr (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, CambridgeMATH
Zurück zum Zitat Lau RYK, Liao SY, Kwok RCW, Xu K, Xia Y, Li Y (2012) Text mining and probabilistic language modeling for online review spam detection. Inf Syst ACM Trans Manage 2(4):25:1–25:30 Lau RYK, Liao SY, Kwok RCW, Xu K, Xia Y, Li Y (2012) Text mining and probabilistic language modeling for online review spam detection. Inf Syst ACM Trans Manage 2(4):25:1–25:30
Zurück zum Zitat Li YM, Liao TF, Lai CY (2012) A social recommender mechanism for improving knowledge sharing in online forums. Inf Process Manage 48(5):978–994CrossRefMATH Li YM, Liao TF, Lai CY (2012) A social recommender mechanism for improving knowledge sharing in online forums. Inf Process Manage 48(5):978–994CrossRefMATH
Zurück zum Zitat Lin C, Yang JM, Cai R, Wang XJ, Wang W (2009) Simultaneously modeling semantics and structure of threaded discussions: a sparse coding approach and its applications. In: Proceedings of the 32nd international ACM SIGIR conference on Research and development in, information retrieval, pp 131–138 Lin C, Yang JM, Cai R, Wang XJ, Wang W (2009) Simultaneously modeling semantics and structure of threaded discussions: a sparse coding approach and its applications. In: Proceedings of the 32nd international ACM SIGIR conference on Research and development in, information retrieval, pp 131–138
Zurück zum Zitat Lin JY, Ke HR, Chien BC, Yang WP (2007) Designing a classifier by a layered multi-population genetic programming approach. Pattern Recogn 40(8):2211–2225CrossRefMATH Lin JY, Ke HR, Chien BC, Yang WP (2007) Designing a classifier by a layered multi-population genetic programming approach. Pattern Recogn 40(8):2211–2225CrossRefMATH
Zurück zum Zitat Liu W, Yan H, Xiao J (2011) Automatically extracting user reviews from forum sites. Comput Math Appl 62(7):2779–2792 Liu W, Yan H, Xiao J (2011) Automatically extracting user reviews from forum sites. Comput Math Appl 62(7):2779–2792
Zurück zum Zitat Liu Y, Huang X, An A, Yu X (2008) Modeling and predicting the helpfulness of online reviews. In: Proceedings of the 2008 eighth IEEE international conference on data mining. IEEE Computer Society, Washington, DC, ICDM ’08, pp 443–452 Liu Y, Huang X, An A, Yu X (2008) Modeling and predicting the helpfulness of online reviews. In: Proceedings of the 2008 eighth IEEE international conference on data mining. IEEE Computer Society, Washington, DC, ICDM ’08, pp 443–452
Zurück zum Zitat Liu Y, Jin J, Ji P, Harding JA, Fung RY (2013) Identifying helpful online reviews: a product designers perspective. Computer-Aided Design 45(2):180–194 Liu Y, Jin J, Ji P, Harding JA, Fung RY (2013) Identifying helpful online reviews: a product designers perspective. Computer-Aided Design 45(2):180–194
Zurück zum Zitat Minka T, Lafferty J (2002) Expectation-propagation for the generative aspect model. Proceedings of the eighteenth conference on uncertainty in, artificial intelligence, pp 352–359 Minka T, Lafferty J (2002) Expectation-propagation for the generative aspect model. Proceedings of the eighteenth conference on uncertainty in, artificial intelligence, pp 352–359
Zurück zum Zitat OMahony M, Smyth B (2010) A classification-based review recommender. Knowl Based Syst 23(4):323–329 OMahony M, Smyth B (2010) A classification-based review recommender. Knowl Based Syst 23(4):323–329
Zurück zum Zitat Petrovi A, Vehovar V (2012) Posting, quoting, and replying: a comparison of methodological approaches to measure communication ties in web forums. Quality Quantity 46(3):829–854 Petrovi A, Vehovar V (2012) Posting, quoting, and replying: a comparison of methodological approaches to measure communication ties in web forums. Quality Quantity 46(3):829–854
Zurück zum Zitat Phan XH, Nguyen CT, Le DT, Nguyen LM, Horiguchi S, Ha QT (2011) A hidden topic-based framework toward building applications with short web documents. IEEE Trans Knowl Data Eng 23(7):961–976CrossRef Phan XH, Nguyen CT, Le DT, Nguyen LM, Horiguchi S, Ha QT (2011) A hidden topic-based framework toward building applications with short web documents. IEEE Trans Knowl Data Eng 23(7):961–976CrossRef
Zurück zum Zitat Surdeanu M, Ciaramita M, Zaragoza H (2008) Learning to rank answers on large online qa collections. In: Proceedings of the 46th annual meeting for the association for computational linguistics: human language technologies (ACL-08: HLT), pp 719–727 Surdeanu M, Ciaramita M, Zaragoza H (2008) Learning to rank answers on large online qa collections. In: Proceedings of the 46th annual meeting for the association for computational linguistics: human language technologies (ACL-08: HLT), pp 719–727
Zurück zum Zitat Tsur O, Rappoport A (2009) Revrank: A fully unsupervised algorithm for selecting the most helpful book reviews. In: Adar E, Hurst M, Finin T, Glance NS, Nicolov N, Tseng BL (eds) Proceedings of the third international ICWSM Conference, ICWSM’09, pp 154–161 Tsur O, Rappoport A (2009) Revrank: A fully unsupervised algorithm for selecting the most helpful book reviews. In: Adar E, Hurst M, Finin T, Glance NS, Nicolov N, Tseng BL (eds) Proceedings of the third international ICWSM Conference, ICWSM’09, pp 154–161
Zurück zum Zitat Wang G, Xie S, Liu B, Yu PS (2012) Identify online store review spammers via social review graph. ACM Trans Intell Syst, Technol 3(4):61:1–61:21 Wang G, Xie S, Liu B, Yu PS (2012) Identify online store review spammers via social review graph. ACM Trans Intell Syst, Technol 3(4):61:1–61:21
Zurück zum Zitat Weimer M, Gurevych I (2007) Predicting the perceived quality of web forum posts. In: Proceedings of RANLP Weimer M, Gurevych I (2007) Predicting the perceived quality of web forum posts. In: Proceedings of RANLP
Zurück zum Zitat Xi W, Lind J, Brill E (2004) Learning effective ranking functions for newsgroup search. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in, information retrieval, pp 394–401 Xi W, Lind J, Brill E (2004) Learning effective ranking functions for newsgroup search. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in, information retrieval, pp 394–401
Zurück zum Zitat Xu G, Ma WY (2006) Building implicit links from content for forum search. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in, information retrieval, pp 300–307 Xu G, Ma WY (2006) Building implicit links from content for forum search. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in, information retrieval, pp 300–307
Zurück zum Zitat Xu J, Li H (2007) Adarank: a boosting algorithm for information retrieval. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in, information retrieval, pp 391–398 Xu J, Li H (2007) Adarank: a boosting algorithm for information retrieval. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in, information retrieval, pp 391–398
Zurück zum Zitat Yeh JY, Lin JY, Ke HR, Yang WP (2007) Learning to rank for information retrieval using genetic programming. In: Joachims T, Li H, Liu TY, Zhai C (eds) SIGIR 2007 workshop: learning to rank for information retrieval Yeh JY, Lin JY, Ke HR, Yang WP (2007) Learning to rank for information retrieval using genetic programming. In: Joachims T, Li H, Liu TY, Zhai C (eds) SIGIR 2007 workshop: learning to rank for information retrieval
Zurück zum Zitat Zhang R, Tran T, Mao Y (2012) Opinion helpfulness prediction in the presence of words of few mouths. World Wide Web 15(2):117–138CrossRef Zhang R, Tran T, Mao Y (2012) Opinion helpfulness prediction in the presence of words of few mouths. World Wide Web 15(2):117–138CrossRef
Zurück zum Zitat Zhang Y, Dang Y, Chen H (2011) Gender classification for web forums. IEEE Trans Syst Man Cybern Part A 668–677 Zhang Y, Dang Y, Chen H (2011) Gender classification for web forums. IEEE Trans Syst Man Cybern Part A 668–677
Zurück zum Zitat Zhang Z (2008) Weighing stars: Aggregating online product reviews for intelligent e-commerce applications. Intell Syst IEEE 23(5):42–49CrossRef Zhang Z (2008) Weighing stars: Aggregating online product reviews for intelligent e-commerce applications. Intell Syst IEEE 23(5):42–49CrossRef
Zurück zum Zitat Zheng X, Hu Z, Xu A, Chen D, Liu K, Li B (2012) Algorithm for recommending answer providers in community-based question answering. J Inf Sci 38(1):3–14CrossRef Zheng X, Hu Z, Xu A, Chen D, Liu K, Li B (2012) Algorithm for recommending answer providers in community-based question answering. J Inf Sci 38(1):3–14CrossRef
Metadaten
Titel
A novel machine learning approach to rank web forum posts
verfasst von
Xiaohui Han
Jun Ma
Yun Wu
Chaoran Cui
Publikationsdatum
01.05.2014
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 5/2014
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-013-1113-8

Weitere Artikel der Ausgabe 5/2014

Soft Computing 5/2014 Zur Ausgabe