Skip to main content
Top
Published in: Data Mining and Knowledge Discovery 5/2017

02-02-2017

Micro-review synthesis for multi-entity summarization

Authors: Thanh-Son Nguyen, Hady W. Lauw, Panayiotis Tsaparas

Published in: Data Mining and Knowledge Discovery | Issue 5/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Location-based social networks (LBSNs), exemplified by Foursquare, are fast gaining popularity. One important feature of LBSNs is micro-review. Upon check-in at a particular venue, a user may leave a short review (up to 200 characters long), also known as a tip. These tips are an important source of information for others to know more about various aspects of an entity (e.g., restaurant), such as food, waiting time, or service. However, a user is often interested not in one particular entity, but rather in several entities collectively, for instance within a neighborhood or a category. In this paper, we address the problem of summarizing the tips of multiple entities in a collection, by way of synthesizing new micro-reviews that pertain to the collection, rather than to the individual entities per se. We formulate this problem in terms of first finding a representation of the collection, by identifying a number of “aspects” that link common threads across two or more entities within the collection. We express these aspects as dense subgraphs in a graph of sentences derived from the multi-entity corpora. This leads to a formulation of maximal multi-entity quasi-cliques, as well as a heuristic algorithm to find K such quasi-cliques maximizing the coverage over the multi-entity corpora. To synthesize a summary tip for each aspect, we select a small number of sentences from the corresponding quasi-clique, balancing conciseness and representativeness in terms of a facility location problem. Our approach performs well on collections of Foursquare entities based on localities and categories, producing more representative and diverse summaries than the baselines.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Here, we use “micro-review” and “tip” interchangeably as we are mostly using Foursquare in our running examples.
 
4
We discuss this in Sect. 5.
 
5
We are working with micro-reviews. A summary tip mimics a micro-review. By definition, micro-reviews are limited by character count. For instance, Foursquare defines the limit to be 200 characters. Therefore, we measure sentence length in terms of characters.
 
9
For SA-Converged, we include data points that completed within 7 days.
 
Literature
go back to reference Abello J, Resende MG, Sudarsky S (2002) Massive quasi-clique detection. In: Latin American symposium on theoretical informatics, pp 598–612. Springer Abello J, Resende MG, Sudarsky S (2002) Massive quasi-clique detection. In: Latin American symposium on theoretical informatics, pp 598–612. Springer
go back to reference Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022MATH Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022MATH
go back to reference Bogdanov P, Baumer B, Basu P, Bar-Noy A, Singh AK (2013) As strong as the weakest link: mining diverse cliques in weighted graphs. In: Joint European conference on machine learning and knowledge discovery in databases, pp 525–540. Springer Bogdanov P, Baumer B, Basu P, Bar-Noy A, Singh AK (2013) As strong as the weakest link: mining diverse cliques in weighted graphs. In: Joint European conference on machine learning and knowledge discovery in databases, pp 525–540. Springer
go back to reference Brunato M, Hoos HH, Battiti R (2007) On effectively finding maximal quasi-cliques in graphs. In: International conference on learning and intelligent optimization, pp 41–55. Springer Brunato M, Hoos HH, Battiti R (2007) On effectively finding maximal quasi-cliques in graphs. In: International conference on learning and intelligent optimization, pp 41–55. Springer
go back to reference Chong W-H, Dai BT, Lim E-P (2015) Did you expect your users to say this? Distilling unexpected micro-reviews for venue owners. In: Proceedings of the 26th ACM conference on hypertext and social media, pp 13–22. ACM Chong W-H, Dai BT, Lim E-P (2015) Did you expect your users to say this? Distilling unexpected micro-reviews for venue owners. In: Proceedings of the 26th ACM conference on hypertext and social media, pp 13–22. ACM
go back to reference Cornuéjols G, Nemhauser GL, Wolsey LA (1983) The uncapacitated facility location problem. Technical report, Defense Technical Information Center (DTIC) Document Cornuéjols G, Nemhauser GL, Wolsey LA (1983) The uncapacitated facility location problem. Technical report, Defense Technical Information Center (DTIC) Document
go back to reference Dawande M, Keskinocak P, Swaminathan JM, Tayur S (2001) On bipartite and multipartite clique problems. J Algorithms 41(2):388–403MathSciNetCrossRefMATH Dawande M, Keskinocak P, Swaminathan JM, Tayur S (2001) On bipartite and multipartite clique problems. J Algorithms 41(2):388–403MathSciNetCrossRefMATH
go back to reference Ference G, Ye M, Lee W-C (2013) Location recommendation for out-of-town users in location-based social networks. In: Proceedings of the 22nd ACM international conference on information and knowledge management, pp 721–726. ACM Ference G, Ye M, Lee W-C (2013) Location recommendation for out-of-town users in location-based social networks. In: Proceedings of the 22nd ACM international conference on information and knowledge management, pp 721–726. ACM
go back to reference Filippova K (2010) Multi-sentence compression: finding shortest paths in word graphs. In: Proceedings of the 23rd international conference on computational linguistics, pp 322–330. Association for Computational Linguistics Filippova K (2010) Multi-sentence compression: finding shortest paths in word graphs. In: Proceedings of the 23rd international conference on computational linguistics, pp 322–330. Association for Computational Linguistics
go back to reference Ganesan K, Zhai C, Han J (2010) Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions. In: Proceedings of the 23rd international conference on computational linguistics, pp 340–348. Association for Computational Linguistics Ganesan K, Zhai C, Han J (2010) Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions. In: Proceedings of the 23rd international conference on computational linguistics, pp 340–348. Association for Computational Linguistics
go back to reference Jiang D, Pei J (2009) Mining frequent cross-graph quasi-cliques. ACM Trans Knowl Discov Data 2(4):16CrossRef Jiang D, Pei J (2009) Mining frequent cross-graph quasi-cliques. ACM Trans Knowl Discov Data 2(4):16CrossRef
go back to reference Karp RM (1972) Reducibility among combinatorial problems. In: Miller RE, Thatcher JW, Bohlinger JD (eds) Complexity of computer computations. Springer US, pp 85–103 Karp RM (1972) Reducibility among combinatorial problems. In: Miller RE, Thatcher JW, Bohlinger JD (eds) Complexity of computer computations. Springer US, pp 85–103
go back to reference Kim HD, Zhai C (2009) Generating comparative summaries of contradictory opinions in text. In: Proceedings of the 18th ACM conference on information and knowledge management, pp 385–394. ACM Kim HD, Zhai C (2009) Generating comparative summaries of contradictory opinions in text. In: Proceedings of the 18th ACM conference on information and knowledge management, pp 385–394. ACM
go back to reference Lappas T, Crovella M, Terzi E (2012) Selecting a characteristic set of reviews. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 832–840. ACM Lappas T, Crovella M, Terzi E (2012) Selecting a characteristic set of reviews. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 832–840. ACM
go back to reference Lappas T, Gunopulos D (2010) Efficient confident search in large review corpora. In: Joint European conference on machine learning and knowledge discovery in databases, pp 195–210. Springer Lappas T, Gunopulos D (2010) Efficient confident search in large review corpora. In: Joint European conference on machine learning and knowledge discovery in databases, pp 195–210. Springer
go back to reference Lindqvist J, Cranshaw J, Wiese J, Hong J, Zimmerman J (2011) I’m the mayor of my house: examining why people use foursquare—a social-driven location sharing application. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 2409–2418. ACM Lindqvist J, Cranshaw J, Wiese J, Hong J, Zimmerman J (2011) I’m the mayor of my house: examining why people use foursquare—a social-driven location sharing application. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 2409–2418. ACM
go back to reference Liu G, Wong L (2008) Effective pruning techniques for mining quasi-cliques. In: Joint European conference on machine learning and knowledge discovery in databases, pp 33–49. Springer Liu G, Wong L (2008) Effective pruning techniques for mining quasi-cliques. In: Joint European conference on machine learning and knowledge discovery in databases, pp 33–49. Springer
go back to reference Lu Y, Tsaparas P, Ntoulas A, Polanyi L (2010) Exploiting social context for review quality prediction. In: Proceedings of the 19th international conference on world wide web, pp 691–700. ACM Lu Y, Tsaparas P, Ntoulas A, Polanyi L (2010) Exploiting social context for review quality prediction. In: Proceedings of the 19th international conference on world wide web, pp 691–700. ACM
go back to reference Manning CD, Raghavan P, Schütze H et al (2008) Introduction to information retrieval, vol 1. Cambridge University Press, CambridgeCrossRefMATH Manning CD, Raghavan P, Schütze H et al (2008) Introduction to information retrieval, vol 1. Cambridge University Press, CambridgeCrossRefMATH
go back to reference Mihalcea R, Tarau P (2004) Textrank: bringing order into texts. In: Proceedings of the conference on empirical methods in natural language processing, pp 404–411. Association for Computational Linguistics Mihalcea R, Tarau P (2004) Textrank: bringing order into texts. In: Proceedings of the conference on empirical methods in natural language processing, pp 404–411. Association for Computational Linguistics
go back to reference Nguyen T-S, Lauw HW, Tsaparas P (2015) Review selection using micro—reviews. IEEE Trans Knowl Data Eng 27(4):1098–1111CrossRef Nguyen T-S, Lauw HW, Tsaparas P (2015) Review selection using micro—reviews. IEEE Trans Knowl Data Eng 27(4):1098–1111CrossRef
go back to reference Nguyen T-S, Lauw HW, Tsaparas P (2015) Review synthesis for micro-review summarization. In: Proceedings of the eighth ACM international conference on web search and data mining, pp 169–178. ACM Nguyen T-S, Lauw HW, Tsaparas P (2015) Review synthesis for micro-review summarization. In: Proceedings of the eighth ACM international conference on web search and data mining, pp 169–178. ACM
go back to reference Noulas A, Scellato S, Mascolo C, Pontil M (2011) An empirical study of geographic user activity patterns in Foursquare. Int Conf Weblogs Soc Media 11:70–573 Noulas A, Scellato S, Mascolo C, Pontil M (2011) An empirical study of geographic user activity patterns in Foursquare. Int Conf Weblogs Soc Media 11:70–573
go back to reference Paul MJ, Zhai C, Girju R (2010) Summarizing contrastive viewpoints in opinionated text. In: Proceedings of the 2010 conference on empirical methods in natural language processing, pp 66–76. Association for Computational Linguistics Paul MJ, Zhai C, Girju R (2010) Summarizing contrastive viewpoints in opinionated text. In: Proceedings of the 2010 conference on empirical methods in natural language processing, pp 66–76. Association for Computational Linguistics
go back to reference Pontes T, Vasconcelos M, Almeida J, Kumaraguru P, Almeida V (2012) We know where you live: privacy characterization of foursquare behavior. In: Proceedings of the 2012 ACM conference on ubiquitous computing, pp 898–905. ACM Pontes T, Vasconcelos M, Almeida J, Kumaraguru P, Almeida V (2012) We know where you live: privacy characterization of foursquare behavior. In: Proceedings of the 2012 ACM conference on ubiquitous computing, pp 898–905. ACM
go back to reference Radev DR, Jing H, Styś M, Tam D (2004) Centroid-based summarization of multiple documents. Inf Process Manag 40(6):919–938CrossRefMATH Radev DR, Jing H, Styś M, Tam D (2004) Centroid-based summarization of multiple documents. Inf Process Manag 40(6):919–938CrossRefMATH
go back to reference Shmoys DB, Tardos É, Aardal K (1997) Approximation algorithms for facility location problems. In: Proceedings of the twenty-ninth annual ACM symposium on theory of computing, pp 265–274. ACM Shmoys DB, Tardos É, Aardal K (1997) Approximation algorithms for facility location problems. In: Proceedings of the twenty-ninth annual ACM symposium on theory of computing, pp 265–274. ACM
go back to reference Sipos R, Joachims T (2013) Generating comparative summaries from reviews. In: Proceedings of the 22nd ACM international conference on conference on information and knowledge management, pp 1853–1856. ACM Sipos R, Joachims T (2013) Generating comparative summaries from reviews. In: Proceedings of the 22nd ACM international conference on conference on information and knowledge management, pp 1853–1856. ACM
go back to reference Sun H, Morales A, Yan X (2013) Synthetic review spamming and defense. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1088–1096. ACM Sun H, Morales A, Yan X (2013) Synthetic review spamming and defense. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1088–1096. ACM
go back to reference Titov I, McDonald R (2008) Modeling online reviews with multi-grain topic models. In: Proceedings of the 17th international conference on World Wide Web, pp 111–120. ACM Titov I, McDonald R (2008) Modeling online reviews with multi-grain topic models. In: Proceedings of the 17th international conference on World Wide Web, pp 111–120. ACM
go back to reference Tsaparas P, Ntoulas A, Terzi E (2011) Selecting a comprehensive set of reviews. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 168–176. ACM Tsaparas P, Ntoulas A, Terzi E (2011) Selecting a comprehensive set of reviews. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 168–176. ACM
go back to reference Tsourakakis C, Bonchi F, Gionis A, Gullo F, Tsiarli M (2013) Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp 104–112. ACM Tsourakakis C, Bonchi F, Gionis A, Gullo F, Tsiarli M (2013) Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp 104–112. ACM
go back to reference Vasconcelos M, Almeida JM, Gonçalves MA (2015) Predicting the popularity of micro-reviews: a foursquare case study. Inf Sci 325:355–374CrossRef Vasconcelos M, Almeida JM, Gonçalves MA (2015) Predicting the popularity of micro-reviews: a foursquare case study. Inf Sci 325:355–374CrossRef
go back to reference Wan X, Yang J, Xiao J (2007) Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. Assoc Comput Linguist 7:552–559 Wan X, Yang J, Xiao J (2007) Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. Assoc Comput Linguist 7:552–559
go back to reference Wang J, Cheng J, Fu AW-C (2013) Redundancy-aware maximal cliques. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp 122–130. ACM Wang J, Cheng J, Fu AW-C (2013) Redundancy-aware maximal cliques. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp 122–130. ACM
go back to reference Yerva SR, Grosan FA, Tandrau AO, Aberer K (2013) Tripeneer: User-based travel plan recommendation application. In: 7th international AAAI conference on weblogs and social media, number EPFL-CONF-185877 Yerva SR, Grosan FA, Tandrau AO, Aberer K (2013) Tripeneer: User-based travel plan recommendation application. In: 7th international AAAI conference on weblogs and social media, number EPFL-CONF-185877
go back to reference Yu Z, Feng Y, Xu H, Zhou X (2014) Recommending travel packages based on mobile crowdsourced data. IEEE Commun Mag 52(8):56–62CrossRef Yu Z, Feng Y, Xu H, Zhou X (2014) Recommending travel packages based on mobile crowdsourced data. IEEE Commun Mag 52(8):56–62CrossRef
go back to reference Zeng Z, Wang J, Zhou L, Karypis G (2006) Coherent closed quasi-clique discovery from large dense graph databases. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 797–802. ACM Zeng Z, Wang J, Zhou L, Karypis G (2006) Coherent closed quasi-clique discovery from large dense graph databases. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 797–802. ACM
go back to reference Zhai C, Velivelli A, Yu B (2004) A cross-collection mixture model for comparative text mining. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, pp 743–748. ACM Zhai C, Velivelli A, Yu B (2004) A cross-collection mixture model for comparative text mining. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, pp 743–748. ACM
Metadata
Title
Micro-review synthesis for multi-entity summarization
Authors
Thanh-Son Nguyen
Hady W. Lauw
Panayiotis Tsaparas
Publication date
02-02-2017
Publisher
Springer US
Published in
Data Mining and Knowledge Discovery / Issue 5/2017
Print ISSN: 1384-5810
Electronic ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-017-0491-4

Other articles of this Issue 5/2017

Data Mining and Knowledge Discovery 5/2017 Go to the issue

Premium Partner