Skip to main content

2014 | OriginalPaper | Buchkapitel

Corpus-Based Information Extraction and Opinion Mining for the Restaurant Recommendation System

verfasst von : Ekaterina Pronoza, Elena Yagunova, Svetlana Volskaya

Erschienen in: Statistical Language and Speech Processing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper corpus-based information extraction and opinion mining method is proposed. Our domain is restaurant reviews, and our information extraction and opinion mining module is a part of a Russian knowledge-based recommendation system.
Our method is based on thorough corpus analysis and automatic selection of machine learning models and feature sets. We also pay special attention to the verification of statistical significance.
According to the results of the research, Naive Bayes models perform well at classifying sentiment with respect to a restaurant aspect, while Logistic Regression is good at deciding on the relevance of a user’s review.
The approach proposed can be used in similar domains, for example, hotel reviews, with data represented by colloquial non-structured texts (in contrast with the domain of technical products, books, etc.) and for other languages with rich morphology and free word order.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
2
It should be noted that the classifier (“model + feature set” combination) with the highest rank does not necessarily demonstrate the highest average weighted F1 score. The classes 0, 1, 2 and 3 assigned to the classifiers in this paper are based on their ranks (according to non-parametric Holm-Bonferroni test) and not F1 scores.
 
3
Baseline features set is considered the simplest one, while Extended_All – the most complex one. MNB and NB models are considered the simplest models, Perceptron – a more complex one, and LogReg and linear SVM – the most complex ones (in fact, they are both similar to Perceptron but their training is more computationally expensive [5]). MNB and NB classifiers are considered similar in the degree of “simplicity” as well as LogReg and linear SVM. A simple model with complex features is considered simpler than a complex model with simple (e.g., baseline) features.
 
Literatur
1.
Zurück zum Zitat Aston, N., Liddle, J., Hu, W.: Twitter sentiment in data streams with perceptron. J. Comput. Commun. 2, 11–16 (2014)CrossRef Aston, N., Liddle, J., Hu, W.: Twitter sentiment in data streams with perceptron. J. Comput. Commun. 2, 11–16 (2014)CrossRef
2.
Zurück zum Zitat Bakliwal, A., Patil, A., Arora, P., Varma, V.: Towards enhanced opinion classification using NLP techniques. In: Proceedings of the Workshop on Sentiment Analysis where AI meets Psychology (SAAIP), IJCNLP, pp. 101–107 (2011) Bakliwal, A., Patil, A., Arora, P., Varma, V.: Towards enhanced opinion classification using NLP techniques. In: Proceedings of the Workshop on Sentiment Analysis where AI meets Psychology (SAAIP), IJCNLP, pp. 101–107 (2011)
3.
Zurück zum Zitat Benamara, F., Cesarano, C., Picariello, A., Reforgiato, D., Subrahmanian, V.S.: Sentiment analysis: adjectives and adverbs are better than adjectives alone. In: Proceedings of the International Conference on Weblogs and Social Media (ICWSM) (2007) Benamara, F., Cesarano, C., Picariello, A., Reforgiato, D., Subrahmanian, V.S.: Sentiment analysis: adjectives and adverbs are better than adjectives alone. In: Proceedings of the International Conference on Weblogs and Social Media (ICWSM) (2007)
4.
Zurück zum Zitat Bermingham, A., Smeaton, A.: Classifying sentiment in microblogs: is brevity an advantage? In: Proceedings of the International Conference on Information and Knowledge Management (CIKM) (2010) Bermingham, A., Smeaton, A.: Classifying sentiment in microblogs: is brevity an advantage? In: Proceedings of the International Conference on Information and Knowledge Management (CIKM) (2010)
5.
Zurück zum Zitat Collobert, R., Bengio, S.: Links between Perceptrons, MLPs and SVMs. In: Proceedings of the 21th International Conference on Machine Learning (2004) Collobert, R., Bengio, S.: Links between Perceptrons, MLPs and SVMs. In: Proceedings of the 21th International Conference on Machine Learning (2004)
6.
Zurück zum Zitat Das, S.R., Chen, M.Y.: Yahoo! for Amazon: sentiment parsing from small talk on the web. Manage. Sci. 53(9), 1375–1388 (2007)CrossRef Das, S.R., Chen, M.Y.: Yahoo! for Amazon: sentiment parsing from small talk on the web. Manage. Sci. 53(9), 1375–1388 (2007)CrossRef
7.
Zurück zum Zitat Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th International Conference on World Wide Web, pp. 519–528 (2003) Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th International Conference on World Wide Web, pp. 519–528 (2003)
8.
Zurück zum Zitat Davidov, D., Tsur, O., Rappoport, A.: Enhanced sentiment learning using twitter hashtags and smileys. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 241–249. Association for Computational Linguistics (2010) Davidov, D., Tsur, O., Rappoport, A.: Enhanced sentiment learning using twitter hashtags and smileys. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 241–249. Association for Computational Linguistics (2010)
9.
Zurück zum Zitat Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MATHMathSciNet Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MATHMathSciNet
10.
Zurück zum Zitat Devitt, A., Ahmad, K.: Is there a language of sentiment? An analysis of lexical resources for sentiment analysis. Lang. Resour. Eval. 47(2), 475–511 (2013)CrossRef Devitt, A., Ahmad, K.: Is there a language of sentiment? An analysis of lexical resources for sentiment analysis. Lang. Resour. Eval. 47(2), 475–511 (2013)CrossRef
11.
Zurück zum Zitat Emadzadeh, E., Nikfarjam, A., Ghauth, K.I., Why, N.K.: Learning materials recommendation using a hybrid recommender system with automated keyword extraction. World Appl. Sci. J. 9(11), 1260–1271 (2010) Emadzadeh, E., Nikfarjam, A., Ghauth, K.I., Why, N.K.: Learning materials recommendation using a hybrid recommender system with automated keyword extraction. World Appl. Sci. J. 9(11), 1260–1271 (2010)
12.
Zurück zum Zitat Gatterbauer, W., Bohunsky, P., Herzog, M., Krüpl, B., Pollak, B.: Towards domain-independent information extraction from web tables. In: Proceedings of the 16th International Conference on World Wide Web, pp. 71–80 (2007) Gatterbauer, W., Bohunsky, P., Herzog, M., Krüpl, B., Pollak, B.: Towards domain-independent information extraction from web tables. In: Proceedings of the 16th International Conference on World Wide Web, pp. 71–80 (2007)
13.
Zurück zum Zitat Iman, R.L., Davenport, J.M.: Approximations of the critical region of the Friedman statistic. Commun. Stat. 18, 571–595 (1980)CrossRef Iman, R.L., Davenport, J.M.: Approximations of the critical region of the Friedman statistic. Commun. Stat. 18, 571–595 (1980)CrossRef
14.
Zurück zum Zitat Kennedy, A., Inkpen, D.: Sentiment classification of movie reviews using contextual valence shifters. Comput. Intell. 22(2), 110–125 (2006)CrossRefMathSciNet Kennedy, A., Inkpen, D.: Sentiment classification of movie reviews using contextual valence shifters. Comput. Intell. 22(2), 110–125 (2006)CrossRefMathSciNet
15.
Zurück zum Zitat Kotelnikov, M., Klekovkina, M.: The automatic sentiment text classification method based on emotional vocabulary. In: RCDL’2012 (2012) Kotelnikov, M., Klekovkina, M.: The automatic sentiment text classification method based on emotional vocabulary. In: RCDL’2012 (2012)
16.
Zurück zum Zitat Leksin, V.A., Nikolenko, S.I.: Semi-supervised tag extraction in a web recommender system. In: Brisaboa, N., Pedreira, O., Zezula, P. (eds.) SISAP 2013. LNCS, vol. 8199, pp. 206–212. Springer, Heidelberg (2013)CrossRef Leksin, V.A., Nikolenko, S.I.: Semi-supervised tag extraction in a web recommender system. In: Brisaboa, N., Pedreira, O., Zezula, P. (eds.) SISAP 2013. LNCS, vol. 8199, pp. 206–212. Springer, Heidelberg (2013)CrossRef
17.
Zurück zum Zitat Li, Y., Nie, J., Zhang, Y., Wang, B., Yan, B., Weng, F.: Contextual recommendation based on text mining. In: Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010): Poster Volume, pp. 692–700 (2010) Li, Y., Nie, J., Zhang, Y., Wang, B., Yan, B., Weng, F.: Contextual recommendation based on text mining. In: Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010): Poster Volume, pp. 692–700 (2010)
18.
Zurück zum Zitat Liu, J., Seneff, S.: Review sentiment scoring via a parse-and-paraphrase paradigm. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, pp. 161–169 (2009) Liu, J., Seneff, S.: Review sentiment scoring via a parse-and-paraphrase paradigm. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, pp. 161–169 (2009)
19.
Zurück zum Zitat Marchand, M., Ginsca, A.L., Besançon, R., Mesnard, O.: [LVIC-LIMSI]: using syntactic features and multi-polarity words for sentiment analysis in twitter. In: Proceedings of the 7th International Workshop on Semantic Evaluation, pp. 418–424 (2013) Marchand, M., Ginsca, A.L., Besançon, R., Mesnard, O.: [LVIC-LIMSI]: using syntactic features and multi-polarity words for sentiment analysis in twitter. In: Proceedings of the 7th International Workshop on Semantic Evaluation, pp. 418–424 (2013)
20.
Zurück zum Zitat Narayanan, V., Arora, I., Bhatia, A.: Fast and accurate sentiment classification using an enhanced Naive Bayes model. In: Yin, H., Tang, K., Gao, Y., Klawonn, F., Lee, M., Weise, T., Li, B., Yao, X. (eds.) IDEAL 2013. LNCS, vol. 8206, pp. 194–201. Springer, Heidelberg (2013)CrossRef Narayanan, V., Arora, I., Bhatia, A.: Fast and accurate sentiment classification using an enhanced Naive Bayes model. In: Yin, H., Tang, K., Gao, Y., Klawonn, F., Lee, M., Weise, T., Li, B., Yao, X. (eds.) IDEAL 2013. LNCS, vol. 8206, pp. 194–201. Springer, Heidelberg (2013)CrossRef
21.
Zurück zum Zitat Naw, N., Hlaing, E.E.: Relevant words extraction method for recommendation system. Int. J. Emer. Technol. Adv. Eng. 3(1), 680–685 (2013) Naw, N., Hlaing, E.E.: Relevant words extraction method for recommendation system. Int. J. Emer. Technol. Adv. Eng. 3(1), 680–685 (2013)
22.
Zurück zum Zitat Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 79–86 (2002) Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 79–86 (2002)
23.
Zurück zum Zitat Pak, A., Paroubek, P.: Language independent approach to sentiment analysis. Komp’uternaya Lingvistika i Intellektualnie Tehnologii: po materialam ezhegodnoy mezhdunarodnoy konferencii “Dialog”, vol. 11(18), RGHU, Moscow, pp. 37–50 (2012) Pak, A., Paroubek, P.: Language independent approach to sentiment analysis. Komp’uternaya Lingvistika i Intellektualnie Tehnologii: po materialam ezhegodnoy mezhdunarodnoy konferencii “Dialog”, vol. 11(18), RGHU, Moscow, pp. 37–50 (2012)
24.
Zurück zum Zitat Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1–2), 1–135 (2008)CrossRef Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1–2), 1–135 (2008)CrossRef
25.
Zurück zum Zitat Park, D.H., Kim, H.K., Kim, J.K.: A literature review and classification of recommender systems research. Soc. Sci. 5, 290–294 (2011) Park, D.H., Kim, H.K., Kim, J.K.: A literature review and classification of recommender systems research. Soc. Sci. 5, 290–294 (2011)
26.
Zurück zum Zitat Pazzani, M.J., Billsus, D.: Content-based recommendation systems. In: Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.) The Adaptive Web, LNCS, vol. 4321, pp. 325–341. Springer, Heildelberg (2007)CrossRef Pazzani, M.J., Billsus, D.: Content-based recommendation systems. In: Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.) The Adaptive Web, LNCS, vol. 4321, pp. 325–341. Springer, Heildelberg (2007)CrossRef
27.
Zurück zum Zitat Pronoza, E., Yagunova, E., Lyashin, A.: Restaurant information extraction for the recommendation system. In: Proceedings of the 2nd Workshop on Social and Algorithmic Issues in Business Support: “Knowledge Hidden in Text”, LTC’2013, (2013) Pronoza, E., Yagunova, E., Lyashin, A.: Restaurant information extraction for the recommendation system. In: Proceedings of the 2nd Workshop on Social and Algorithmic Issues in Business Support: “Knowledge Hidden in Text”, LTC’2013, (2013)
28.
Zurück zum Zitat Ricci, F., Rokach, L., Shapira, B., Kantor, P.: Recommender Systems Handbook. Springer, New York (2011)CrossRefMATH Ricci, F., Rokach, L., Shapira, B., Kantor, P.: Recommender Systems Handbook. Springer, New York (2011)CrossRefMATH
29.
Zurück zum Zitat Saif, H.: Sentiment analysis of microblogs. Mining the New World. Technical Report KMI-12-2 (2012) Saif, H.: Sentiment analysis of microblogs. Mining the New World. Technical Report KMI-12-2 (2012)
30.
Zurück zum Zitat Sarawagi, S.: Information extraction. Found. Trends Databases 1(3), 261–377 (2007)CrossRef Sarawagi, S.: Information extraction. Found. Trends Databases 1(3), 261–377 (2007)CrossRef
32.
Zurück zum Zitat Sharma, A., Dey, S.: An artificial neural network based approach for sentiment analysis of opinionated text. In: Proceedings of the 2012 ACM Research in Applied Computation Symposium, pp. 37–42 (2012) Sharma, A., Dey, S.: An artificial neural network based approach for sentiment analysis of opinionated text. In: Proceedings of the 2012 ACM Research in Applied Computation Symposium, pp. 37–42 (2012)
33.
Zurück zum Zitat Sidorov, G., Velasquez, F., Stamatatos, E., Gelbukh, A., Chanona-Hernández, L.: Syntactic n-grams as machine learning features for natural language processing. Expert Syst. Appl. 41(3), 853–860 (2014)CrossRef Sidorov, G., Velasquez, F., Stamatatos, E., Gelbukh, A., Chanona-Hernández, L.: Syntactic n-grams as machine learning features for natural language processing. Expert Syst. Appl. 41(3), 853–860 (2014)CrossRef
34.
Zurück zum Zitat Socher, R., Perelygin, A., Wy, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (2013) Socher, R., Perelygin, A., Wy, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (2013)
35.
Zurück zum Zitat Turmo, J., Ageno, A., Català, N.: Adaptive information extraction. ACM Comput. Surv. 38(2), 3 (2006)CrossRef Turmo, J., Ageno, A., Català, N.: Adaptive information extraction. ACM Comput. Surv. 38(2), 3 (2006)CrossRef
36.
Zurück zum Zitat Turney, P.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 417–424 (2002) Turney, P.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 417–424 (2002)
37.
Zurück zum Zitat Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL), vol. 2, pp. 90–94 (2012) Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL), vol. 2, pp. 90–94 (2012)
Metadaten
Titel
Corpus-Based Information Extraction and Opinion Mining for the Restaurant Recommendation System
verfasst von
Ekaterina Pronoza
Elena Yagunova
Svetlana Volskaya
Copyright-Jahr
2014
DOI
https://doi.org/10.1007/978-3-319-11397-5_21