Skip to main content

2015 | OriginalPaper | Buchkapitel

The Pareto Principle Is Everywhere: Finding Informative Sentences for Opinion Summarization Through Leader Detection

verfasst von : Linhong Zhu, Sheng Gao, Sinno Jialin Pan, Haizhou Li, Dingxiong Deng, Cyrus Shahabi

Erschienen in: Recommendation and Search in Social Networks

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Most previous works on opinion summarization focus on summarizing sentiment polarity distribution toward different aspects of an entity (e.g., battery life and screen of a mobile phone). However, users’ demand may be more beyond this kind of opinion summarization. Besides such coarse-grained summarization on aspects, one may prefer to read detailed but concise text of the opinion data for more information. In this paper, we propose a new framework for opinion summarization. Our goal is to assist users to get helpful opinion suggestions from reviews by only reading a short summary with a few informative sentences, where the quality of summary is evaluated in terms of both aspect coverage and viewpoints preservation. More specifically, we formulate the informative sentence selection problem in opinion summarization as a community leader detection problem, where a community consists of a cluster of sentences toward the same aspect of an entity and leaders can be considered as the most informative sentences of the corresponding aspect. We develop two effective algorithms to identify communities and leaders. Reviews of six products from Amazon.com are used to verify the effectiveness of our method for opinion summarization.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
2
Note that \(|\mathcal {N}_{k}(s)|\) can be larger than \(k\) since there could be the event of ties (i.e., a set of neighbors have the same similarity to \(s\)).
 
4
A longer summary is more likely to provide better information but is less concise.
 
5
ROUGE-N is a popular toolkit which measures the quality of a summary by comparing it to other reference summaries using \(n\)-gram co-occurrence.
 
Literatur
1.
Zurück zum Zitat Ageev AA, Sviridenko M (1999) Approximation algorithms for maximum coverage and max cut with given sizes of parts. In: Proceedings of the 7th international conference on integer programming and combinatorial optimization, Springer, London, pp 17–30 Ageev AA, Sviridenko M (1999) Approximation algorithms for maximum coverage and max cut with given sizes of parts. In: Proceedings of the 7th international conference on integer programming and combinatorial optimization, Springer, London, pp 17–30
2.
Zurück zum Zitat Beineke P, Hastie T, Manning C, Vaithyanathan S (2004) Exploring sentiment summarization. In: AAAI spring symposium on exploring attitude and affect in text: theories and applications Beineke P, Hastie T, Manning C, Vaithyanathan S (2004) Exploring sentiment summarization. In: AAAI spring symposium on exploring attitude and affect in text: theories and applications
3.
Zurück zum Zitat Blair-goldensohn S, Neylon T, Hannan K, Reis GA, Mcdonald R, Reynar J (2008) Building a sentiment summarizer for local service reviews. In: NLP in the information explosion era Blair-goldensohn S, Neylon T, Hannan K, Reis GA, Mcdonald R, Reynar J (2008) Building a sentiment summarizer for local service reviews. In: NLP in the information explosion era
4.
Zurück zum Zitat Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022MATH Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022MATH
5.
6.
Zurück zum Zitat Cheng J, Ke Y, Fu AWC, Yu JX, Zhu L (2010) Finding maximal cliques in massive networks by h*-graph. In: Proceedings of the SIGMOD. ACM, New York, pp 447–458 Cheng J, Ke Y, Fu AWC, Yu JX, Zhu L (2010) Finding maximal cliques in massive networks by h*-graph. In: Proceedings of the SIGMOD. ACM, New York, pp 447–458
7.
Zurück zum Zitat Danescu-Niculescu-Mizil C, Kossinets G, Kleinberg JM, Lee L (2009) How opinions are received by online communities: a case study on amazon.com helpfulness votes. In: Proceedings of the 18th WWW, ACM, New York, pp 141–150 Danescu-Niculescu-Mizil C, Kossinets G, Kleinberg JM, Lee L (2009) How opinions are received by online communities: a case study on amazon.com helpfulness votes. In: Proceedings of the 18th WWW, ACM, New York, pp 141–150
8.
Zurück zum Zitat Erkan G, Radev DR (2004) Lexpagerank: prestige in multi-document text summarization. In: Proceedings of EMNLP, Barcelona, Spain Erkan G, Radev DR (2004) Lexpagerank: prestige in multi-document text summarization. In: Proceedings of EMNLP, Barcelona, Spain
9.
Zurück zum Zitat Filippova K (2010) Multi-sentence compression: finding shortest paths in word graphs. In: COLING, pp 322–330 Filippova K (2010) Multi-sentence compression: finding shortest paths in word graphs. In: COLING, pp 322–330
10.
Zurück zum Zitat Freeman LC (1979) Centrality in social networks: conceptual clarification. Soc Netw 1(3):215–239CrossRef Freeman LC (1979) Centrality in social networks: conceptual clarification. Soc Netw 1(3):215–239CrossRef
11.
Zurück zum Zitat Ganesan K, Zhai C, Han J (2010) Opinosis: a graph based approach to abstractive summarization of highly redundant opinions. In: Proceedings of the 23rd COLING Ganesan K, Zhai C, Han J (2010) Opinosis: a graph based approach to abstractive summarization of highly redundant opinions. In: Proceedings of the 23rd COLING
12.
Zurück zum Zitat Heerschop B, Goossen F, Hogenboom A, Frasincar F, Kaymak U, de Jong F (2011) Polarity analysis of texts using discourse structure. In: Proceedings of the 20th CIKM. ACM, New York, pp 1061–1070 Heerschop B, Goossen F, Hogenboom A, Frasincar F, Kaymak U, de Jong F (2011) Polarity analysis of texts using discourse structure. In: Proceedings of the 20th CIKM. ACM, New York, pp 1061–1070
13.
Zurück zum Zitat Hirsch JE (2005) An index to quantify an individual’s scientific research output. Proc Natl Acad Sci U S A 102(46):16569–16572CrossRef Hirsch JE (2005) An index to quantify an individual’s scientific research output. Proc Natl Acad Sci U S A 102(46):16569–16572CrossRef
14.
Zurück zum Zitat Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of uncertainty in artificial intelligence, pp 289–296 Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of uncertainty in artificial intelligence, pp 289–296
15.
Zurück zum Zitat Hu B, Song Z, Ester M (2012) User features and social networks for topic modeling in online social media. In: ASONAM, pp 202–209 Hu B, Song Z, Ester M (2012) User features and social networks for topic modeling in online social media. In: ASONAM, pp 202–209
16.
Zurück zum Zitat Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the 10th ACM SIGKDD. ACM, New York, pp 168–177 Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the 10th ACM SIGKDD. ACM, New York, pp 168–177
17.
Zurück zum Zitat Jin F, Huang M, Zhu X (2010) A comparative study on ranking and selection strategies for multi-document summarization. In: COLING (Posters), pp 525–533 Jin F, Huang M, Zhu X (2010) A comparative study on ranking and selection strategies for multi-document summarization. In: COLING (Posters), pp 525–533
19.
Zurück zum Zitat Kim HD, Ganesan K, Sondhi P, Zhai C (2011) Comprehensive review of opinion summarization Kim HD, Ganesan K, Sondhi P, Zhai C (2011) Comprehensive review of opinion summarization
20.
Zurück zum Zitat Kim SM, Pantel P, Chklovski T, Pennacchiotti M (2006) Automatically assessing review helpfulness. In: Proceedings of EMNLP. Association for Computational Linguistics, Stroudsburg, pp 423–430CrossRef Kim SM, Pantel P, Chklovski T, Pennacchiotti M (2006) Automatically assessing review helpfulness. In: Proceedings of EMNLP. Association for Computational Linguistics, Stroudsburg, pp 423–430CrossRef
21.
Zurück zum Zitat Lerman K, Blair-Goldensohn S, McDonald R (2009) Sentiment summarization: evaluating and learning user preferences. In: Proceedings of the 12th EACL. ACL, Stroudsburg, pp 514–522CrossRef Lerman K, Blair-Goldensohn S, McDonald R (2009) Sentiment summarization: evaluating and learning user preferences. In: Proceedings of the 12th EACL. ACL, Stroudsburg, pp 514–522CrossRef
22.
Zurück zum Zitat Li F, Huang M, Yang Y, Zhu X (2011) Learning to identify review spam. In: IJCAI, pp 2488–2493 Li F, Huang M, Yang Y, Zhu X (2011) Learning to identify review spam. In: IJCAI, pp 2488–2493
23.
Zurück zum Zitat Lim EP, Nguyen VA, Jindal N, Liu B, Lauw HW (2010) Detecting product review spammers using rating behaviors. In: Proceedings of the 19th CIKM. ACM, New York, pp 939–948 Lim EP, Nguyen VA, Jindal N, Liu B, Lauw HW (2010) Detecting product review spammers using rating behaviors. In: Proceedings of the 19th CIKM. ACM, New York, pp 939–948
24.
Zurück zum Zitat Lin CY, Hovy E (2003) Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Proceedings of the NAACL. ACL, Stroudsburg, pp 71–78 Lin CY, Hovy E (2003) Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Proceedings of the NAACL. ACL, Stroudsburg, pp 71–78
25.
Zurück zum Zitat Lin H, Bilmes J (2011) A class of submodular functions for document summarization. In: Proceedings of the 49th HLT/ACL. ACL, Stroudsburg, pp 510–520 Lin H, Bilmes J (2011) A class of submodular functions for document summarization. In: Proceedings of the 49th HLT/ACL. ACL, Stroudsburg, pp 510–520
26.
Zurück zum Zitat Liu J, Cao Y, Lin CY, Huang Y, Zhou M (2007) Low-Quality product review detection in opinion summarization. In: Proceedings of the joint conference on EMNLP-CoNLL, pp 334–342 Liu J, Cao Y, Lin CY, Huang Y, Zhou M (2007) Low-Quality product review detection in opinion summarization. In: Proceedings of the joint conference on EMNLP-CoNLL, pp 334–342
27.
Zurück zum Zitat Lu Q, Getoor L (2003) Link-based classification. In: Proceedings of the 20th ICML. AAAI Press, Chicago, pp 496–503 Lu Q, Getoor L (2003) Link-based classification. In: Proceedings of the 20th ICML. AAAI Press, Chicago, pp 496–503
28.
Zurück zum Zitat Lu Y, Zhai C, Sundaresan N (2009) Rated aspect summarization of short comments. In: Proceedings of the 18th WWW. ACM, New York, pp 131–140 Lu Y, Zhai C, Sundaresan N (2009) Rated aspect summarization of short comments. In: Proceedings of the 18th WWW. ACM, New York, pp 131–140
29.
Zurück zum Zitat Mei Q, Ling X, Wondra M, Su H, Zhai C (2007) Topic sentiment mixture: modeling facets and opinions in weblogs. In: Proceedings of the 16th WWW. ACM, New York, pp 171–180 Mei Q, Ling X, Wondra M, Su H, Zhai C (2007) Topic sentiment mixture: modeling facets and opinions in weblogs. In: Proceedings of the 16th WWW. ACM, New York, pp 171–180
30.
Zurück zum Zitat Mihalcea R, Tarau P (2004) Textrank: bringing order into text. In: EMNLP, pp 404–411 Mihalcea R, Tarau P (2004) Textrank: bringing order into text. In: EMNLP, pp 404–411
31.
Zurück zum Zitat Muthukrishnan P, Gerrish J, Radev DR (2008) Detecting multiple facets of an event using graph-based unsupervised methods. In: COLING, pp 609–616 Muthukrishnan P, Gerrish J, Radev DR (2008) Detecting multiple facets of an event using graph-based unsupervised methods. In: COLING, pp 609–616
32.
Zurück zum Zitat Newman MEJ (2007) The mathematics of networks. The new palgrave encyclopedia of economics pp 1–12 Newman MEJ (2007) The mathematics of networks. The new palgrave encyclopedia of economics pp 1–12
33.
Zurück zum Zitat Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd ACL. Association for Computational Linguistics, Stroudsburg Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd ACL. Association for Computational Linguistics, Stroudsburg
34.
Zurück zum Zitat Popescu AM, Etzioni O (2005) Extracting product features and opinions from reviews. In: Proceedings of the HLT and EMNLP. Association for Computational Linguistics, Stroudsburg, pp 339–346 Popescu AM, Etzioni O (2005) Extracting product features and opinions from reviews. In: Proceedings of the HLT and EMNLP. Association for Computational Linguistics, Stroudsburg, pp 339–346
36.
Zurück zum Zitat Smith LM, Zhu L, Lerman K, Kozareva Z (2013) The role of social media in the discussion of controversial topics. In: SocialCom, pp 236–243 Smith LM, Zhu L, Lerman K, Kozareva Z (2013) The role of social media in the discussion of controversial topics. In: SocialCom, pp 236–243
37.
Zurück zum Zitat Taboada M, Anthony C, Voll K (2006) Methods for creating semantic orientation dictionaries. In: Proceedings of 5th ICLRE, Genoa, Italy pp 427–432 Taboada M, Anthony C, Voll K (2006) Methods for creating semantic orientation dictionaries. In: Proceedings of 5th ICLRE, Genoa, Italy pp 427–432
38.
Zurück zum Zitat Taskar B, Wong M, Abbeel P, Koller D (2004) Link prediction in relational data. In: NIPS. MIT Press, Cambridge Taskar B, Wong M, Abbeel P, Koller D (2004) Link prediction in relational data. In: NIPS. MIT Press, Cambridge
39.
Zurück zum Zitat Titov I, McDonald RT (2008) A joint model of text and aspect ratings for sentiment summarization. In: ACL, pp 308–316 Titov I, McDonald RT (2008) A joint model of text and aspect ratings for sentiment summarization. In: ACL, pp 308–316
40.
Zurück zum Zitat Tsaparas P, Ntoulas A, Terzi E (2011) Selecting a comprehensive set of reviews. In: Proceedings of the 17th ACM SIGKDD. ACM, New York, pp 168–176 Tsaparas P, Ntoulas A, Terzi E (2011) Selecting a comprehensive set of reviews. In: Proceedings of the 17th ACM SIGKDD. ACM, New York, pp 168–176
41.
Zurück zum Zitat Wan X, Yang J (2008) Multi-document summarization using cluster-based link analysis. In: Proceedings of the 31st ACM SIGIR. ACM, New York, pp 299–306 Wan X, Yang J (2008) Multi-document summarization using cluster-based link analysis. In: Proceedings of the 31st ACM SIGIR. ACM, New York, pp 299–306
42.
Zurück zum Zitat Wang D, Li T (2010) Document update summarization using incremental hierarchical clustering. In: Proceedings of the 19th CIKM. ACM, New York, pp 279–288 Wang D, Li T (2010) Document update summarization using incremental hierarchical clustering. In: Proceedings of the 19th CIKM. ACM, New York, pp 279–288
43.
Zurück zum Zitat Yu J, Zha ZJ, Wang M, Chua TS (2011) Aspect ranking: Identifying important product aspects from online consumer reviews. In: ACL, The Association for Computer Linguistics, pp 1496–1505 Yu J, Zha ZJ, Wang M, Chua TS (2011) Aspect ranking: Identifying important product aspects from online consumer reviews. In: ACL, The Association for Computer Linguistics, pp 1496–1505
44.
Zurück zum Zitat Zhu L, Galstyan A, Cheng J, Lerman K (2014) Tripartite graph clustering for dynamic sentiment analysis on social media. In: SIGMOD Conference, pp 1531–1542 Zhu L, Galstyan A, Cheng J, Lerman K (2014) Tripartite graph clustering for dynamic sentiment analysis on social media. In: SIGMOD Conference, pp 1531–1542
45.
Zurück zum Zitat Zhu L, Galstyan A, Cheng J, Lerman K (2014) Tripartite graph clustering for dynamic sentiment analysis on social media. CoRR abs/1402.6010 Zhu L, Galstyan A, Cheng J, Lerman K (2014) Tripartite graph clustering for dynamic sentiment analysis on social media. CoRR abs/1402.6010
46.
Zurück zum Zitat Zhu L, Gao S, Pan SJ, Li H, Deng D, Shahabi C (2013) Graph-based informative-sentence selection for opinion summarization. In: ASONAM, pp 408–412 Zhu L, Gao S, Pan SJ, Li H, Deng D, Shahabi C (2013) Graph-based informative-sentence selection for opinion summarization. In: ASONAM, pp 408–412
47.
Zurück zum Zitat Zhuang L, Jing F, Zhu XY (2006) Movie review mining and summarization. In: Proceedings of the 15th CIKM. ACM, New York, pp 43–50 Zhuang L, Jing F, Zhu XY (2006) Movie review mining and summarization. In: Proceedings of the 15th CIKM. ACM, New York, pp 43–50
Metadaten
Titel
The Pareto Principle Is Everywhere: Finding Informative Sentences for Opinion Summarization Through Leader Detection
verfasst von
Linhong Zhu
Sheng Gao
Sinno Jialin Pan
Haizhou Li
Dingxiong Deng
Cyrus Shahabi
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-14379-8_9