Skip to main content
Top
Published in: Neural Computing and Applications 10/2019

21-04-2018 | Original Article

A novel feature extraction methodology for sentiment analysis of product reviews

Authors: Xin Chen, Yun Xue, Hongya Zhao, Xin Lu, Xiaohui Hu, Zhihao Ma

Published in: Neural Computing and Applications | Issue 10/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Feature extraction is one of the key steps for text sentiment analysis (SA), and the corresponding algorithms have important effect on the results. In the paper, a novel methodology is proposed to extract the feature for SA of product reviews. First, based on the diversified expression forms of product reviews, the generalized TF–IDF feature vectors are obtained by introducing the semantic similarity of synonyms. Then, in view of the different lengths of product reviews, the local patterns of the feature vectors are identified with OPSM biclustering algorithm. Finally, we improve PrefixSpan algorithm to detect the frequent and pseudo-consecutive phrases with high discriminative ability (namely FPCD phrases), which contain word-order information. Furthermore, some important factors, such as the separation and discriminative ability of words, are also employed to improve the discriminative ability of sentiment polarity. Based on the previous steps, the text feature vectors are extracted. A series of the experiment and comparison results indicate that the performance for SA on product review is greatly improved.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the 2002 conference on empirical methods in natural language processing (EMNLP), pp 79–86 Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the 2002 conference on empirical methods in natural language processing (EMNLP), pp 79–86
2.
go back to reference Tan S, Zhang J (2008) An empirical study of sentiment analysis for Chinese documents. Expert Syst Appl 34(4):2622–2629CrossRef Tan S, Zhang J (2008) An empirical study of sentiment analysis for Chinese documents. Expert Syst Appl 34(4):2622–2629CrossRef
3.
go back to reference Zhang HJ, Ji Y, Li J, Ye Y (2016) A triple wing harmonium model for movie recommendation. IEEE Trans Ind Inf 12(1):231–239CrossRef Zhang HJ, Ji Y, Li J, Ye Y (2016) A triple wing harmonium model for movie recommendation. IEEE Trans Ind Inf 12(1):231–239CrossRef
4.
go back to reference Zhang Y (2015) Incorporating phrase-level sentiment analysis on textual reviews for personalized recommendation. In: Proceedings of the eighth ACM international conference on web search and data mining. ACM 2015, pp 435–440 Zhang Y (2015) Incorporating phrase-level sentiment analysis on textual reviews for personalized recommendation. In: Proceedings of the eighth ACM international conference on web search and data mining. ACM 2015, pp 435–440
5.
go back to reference Yaakub MR, Li Y, Zhang J (2013) Integration of sentiment analysis into customer relational model: the importance of feature ontology and synonym. Procedia Technol 11:495–501CrossRef Yaakub MR, Li Y, Zhang J (2013) Integration of sentiment analysis into customer relational model: the importance of feature ontology and synonym. Procedia Technol 11:495–501CrossRef
6.
go back to reference Wang W, Tan G, Wang H (2016) Cross-domain comparison of algorithm performance in extracting aspect-based opinions from Chinese online reviews. Int J Mach Learn Cybern 8(3):1–18 Wang W, Tan G, Wang H (2016) Cross-domain comparison of algorithm performance in extracting aspect-based opinions from Chinese online reviews. Int J Mach Learn Cybern 8(3):1–18
7.
go back to reference Basu T, Murthy C (2016) A supervised term selection technique for effective text categorization. Int J Mach Learn Cybern 7(5):877–892CrossRef Basu T, Murthy C (2016) A supervised term selection technique for effective text categorization. Int J Mach Learn Cybern 7(5):877–892CrossRef
8.
go back to reference Sivic J, Zisserman A (2009) Efficient visual search of videos cast as text retrieval. IEEE Trans Pattern Anal Mach Intell 31(4):591–606CrossRef Sivic J, Zisserman A (2009) Efficient visual search of videos cast as text retrieval. IEEE Trans Pattern Anal Mach Intell 31(4):591–606CrossRef
9.
go back to reference Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155MATH Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155MATH
10.
go back to reference Ben-Dor A, Chor B, Karp R, Yakhini Z (2003) Discovering local structure in gene expression data: the order-preserving submatrix problem. J Comput Biol 10(3–4):373–384CrossRef Ben-Dor A, Chor B, Karp R, Yakhini Z (2003) Discovering local structure in gene expression data: the order-preserving submatrix problem. J Comput Biol 10(3–4):373–384CrossRef
11.
go back to reference Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M-C (2004) Mining sequential patterns by pattern-growth: the Prefixspan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440CrossRef Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M-C (2004) Mining sequential patterns by pattern-growth: the Prefixspan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440CrossRef
12.
go back to reference Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. Meet Assoc Comput Linguist Hum Lang Technol 2011:142–150 Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. Meet Assoc Comput Linguist Hum Lang Technol 2011:142–150
13.
go back to reference Salton G, Yu CT (1974) On the construction of effective vocabularies for information retrieval. ACM SIGIR Forum 9(3):48–60CrossRef Salton G, Yu CT (1974) On the construction of effective vocabularies for information retrieval. ACM SIGIR Forum 9(3):48–60CrossRef
14.
go back to reference Morin F, Bengio Y (2005) Hierarchical probabilistic neural network language model. Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, pp 246–252 Morin F, Bengio Y (2005) Hierarchical probabilistic neural network language model. Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, pp 246–252
15.
go back to reference Mnih A, Hinton GE (2009) A scalable hierarchical distributed language model. International Conference on Neural Information Processing Systems, pp 1081–1088 Mnih A, Hinton GE (2009) A scalable hierarchical distributed language model. International Conference on Neural Information Processing Systems, pp 1081–1088
16.
go back to reference Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. Proceedings of Workshop at International Conference on Learning Representations, pp 1–12 Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. Proceedings of Workshop at International Conference on Learning Representations, pp 1–12
17.
go back to reference Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. Conf Empir Methods Nat Lang Proc 2014:1532–1543 Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. Conf Empir Methods Nat Lang Proc 2014:1532–1543
18.
go back to reference Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. Comput Sci 5(1):36 Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. Comput Sci 5(1):36
19.
go back to reference Bojanowski P, Grave E, Joulin A, Mikolov T (2016) Enriching word vectors with subword information. arXiv preprint arXiv:160704606 Bojanowski P, Grave E, Joulin A, Mikolov T (2016) Enriching word vectors with subword information. arXiv preprint arXiv:​160704606
20.
go back to reference Wang Y, Liu Z, Sun M (2015) Incorporating linguistic knowledge for learning distributed word representations. PLoS ONE 10(4):e0118437CrossRef Wang Y, Liu Z, Sun M (2015) Incorporating linguistic knowledge for learning distributed word representations. PLoS ONE 10(4):e0118437CrossRef
21.
go back to reference Matsumoto S, Takamura H, Okumura M (2005) Sentiment classification using word sub-sequences and dependency sub-trees. In: Pacific-Asia conference on knowledge discovery and data mining, 2005. Springer, pp 301–311 Matsumoto S, Takamura H, Okumura M (2005) Sentiment classification using word sub-sequences and dependency sub-trees. In: Pacific-Asia conference on knowledge discovery and data mining, 2005. Springer, pp 301–311
22.
go back to reference Dong Z, Dong Q (2003) HowNet—a hybrid language and knowledge resource. Int Conf Nat Lang Process Knowl Eng Proc 2003:820–824 Dong Z, Dong Q (2003) HowNet—a hybrid language and knowledge resource. Int Conf Nat Lang Process Knowl Eng Proc 2003:820–824
23.
go back to reference Yuan B, Liu Y, Li H (2013) Sentiment classification in Chinese microblogs: lexicon-based and learning-based approaches. Int Proc Econ Dev Res 68:1 Yuan B, Liu Y, Li H (2013) Sentiment classification in Chinese microblogs: lexicon-based and learning-based approaches. Int Proc Econ Dev Res 68:1
24.
go back to reference Miller GA (1995) Wordnet: a lexical database for English. Commun ACM 38(11):39–41CrossRef Miller GA (1995) Wordnet: a lexical database for English. Commun ACM 38(11):39–41CrossRef
25.
go back to reference Esuli A, Sebastiani F (2006) Sentiwordnet: a publicly available lexical resource for opinion mining. Proceedings of the 5th Conference on Language Resources and Evaluation, pp 417–422 Esuli A, Sebastiani F (2006) Sentiwordnet: a publicly available lexical resource for opinion mining. Proceedings of the 5th Conference on Language Resources and Evaluation, pp 417–422
26.
go back to reference Xu R, Chen T, Xia Y, Lu Q, Liu B, Wang X (2015) Word embedding composition for data imbalances in sentiment and emotion classification. Cogn Comput 7(2):226–240CrossRef Xu R, Chen T, Xia Y, Lu Q, Liu B, Wang X (2015) Word embedding composition for data imbalances in sentiment and emotion classification. Cogn Comput 7(2):226–240CrossRef
27.
go back to reference Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis, vol 344. Wiley, HobokenMATH Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis, vol 344. Wiley, HobokenMATH
28.
go back to reference Törönen P, Kolehmainen M, Wong G, Castren E (1999) Analysis of gene expression data using self-organizing maps. FEBS Lett 451(2):142–146CrossRef Törönen P, Kolehmainen M, Wong G, Castren E (1999) Analysis of gene expression data using self-organizing maps. FEBS Lett 451(2):142–146CrossRef
29.
go back to reference Xu JH, Liu H (2010) Web user clustering analysis based on Kmeans algorithm. In: 2010 international conference on information, networking and automation, 2010, pp V2-6–V2-9 Xu JH, Liu H (2010) Web user clustering analysis based on Kmeans algorithm. In: 2010 international conference on information, networking and automation, 2010, pp V2-6–V2-9
30.
go back to reference Xue Y, Liu ZW, Luo J, Ma ZH, Zhang MZ, Hu XH, Kuang QH (2015) Stock market trading rules discovery based on biclustering method. Math Probl Eng 2015:1–13CrossRef Xue Y, Liu ZW, Luo J, Ma ZH, Zhang MZ, Hu XH, Kuang QH (2015) Stock market trading rules discovery based on biclustering method. Math Probl Eng 2015:1–13CrossRef
31.
go back to reference Cheng Y, Church GM (2000) Biclustering of expression data. Int Conf Intell Syst Mol Biol 2000:93 Cheng Y, Church GM (2000) Biclustering of expression data. Int Conf Intell Syst Mol Biol 2000:93
32.
go back to reference Yang J, Wang W, Wang H (2002)/spl delta/-clusters: capturing subspace correlation in a large data set. In: Proceedings of the 18th international conference on data engineering 2002, pp 517–528 Yang J, Wang W, Wang H (2002)/spl delta/-clusters: capturing subspace correlation in a large data set. In: Proceedings of the 18th international conference on data engineering 2002, pp 517–528
34.
go back to reference Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 1(1):24–45CrossRef Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 1(1):24–45CrossRef
35.
go back to reference Liu ZW, Xue Y, Li MH, Ma B, Zhang MZ, Chen X, Hu XH (2017) Discovery of deep order-preserving submatrix in DNA microarray data based on sequential pattern mining. Int J Data Min Bioinform 17(3):217–237CrossRef Liu ZW, Xue Y, Li MH, Ma B, Zhang MZ, Chen X, Hu XH (2017) Discovery of deep order-preserving submatrix in DNA microarray data based on sequential pattern mining. Int J Data Min Bioinform 17(3):217–237CrossRef
36.
go back to reference Wang H (2007) All common subsequences. In: Proceedings of the international joint conference on artificial intelligence, 2007, pp 635–640 Wang H (2007) All common subsequences. In: Proceedings of the international joint conference on artificial intelligence, 2007, pp 635–640
37.
go back to reference Han JW, Pei J, Mortazavi-Asl B, Chen Q, Dayal U, Hsu M-C (2000) Freespan: frequent pattern-projected sequential pattern mining. Paper presented at the proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, 2000, pp 355–359 Han JW, Pei J, Mortazavi-Asl B, Chen Q, Dayal U, Hsu M-C (2000) Freespan: frequent pattern-projected sequential pattern mining. Paper presented at the proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, 2000, pp 355–359
38.
go back to reference Peterson EA, Tang P (2008) Mining frequent sequential patterns with first-occurrence forests. In: Proceedings of the 46th annual southeast regional conference on XX. ACM, 2008, pp 34–39 Peterson EA, Tang P (2008) Mining frequent sequential patterns with first-occurrence forests. In: Proceedings of the 46th annual southeast regional conference on XX. ACM, 2008, pp 34–39
39.
go back to reference Zhang HP, Yu HK, Xiong DY, Liu Q (2003) HHMM-based Chinese lexical analyzer ICTCLAS. Sighan Workshop on Chinese Language Processing, pp 758–759 Zhang HP, Yu HK, Xiong DY, Liu Q (2003) HHMM-based Chinese lexical analyzer ICTCLAS. Sighan Workshop on Chinese Language Processing, pp 758–759
40.
go back to reference Wang C, Zhang M, Ma S, Ru L (2008) Automatic online news issue construction in web environment. Int Conf World Wide Web 2008:457–466 Wang C, Zhang M, Ma S, Ru L (2008) Automatic online news issue construction in web environment. Int Conf World Wide Web 2008:457–466
41.
go back to reference Hashimoto TB, Alvarezmelis D, Jaakkola TS (2015) Word, graph and manifold embedding from Markov processes. New Media & Society, pp 1–6 Hashimoto TB, Alvarezmelis D, Jaakkola TS (2015) Word, graph and manifold embedding from Markov processes. New Media & Society, pp 1–6
42.
go back to reference Manning CD, Surdeanu M, Bauer J, Finkel J, Bethard SJ, Mcclosky D (2014) The Stanford Corenlp Natural Language Processing Toolkit. Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp 55–60 Manning CD, Surdeanu M, Bauer J, Finkel J, Bethard SJ, Mcclosky D (2014) The Stanford Corenlp Natural Language Processing Toolkit. Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp 55–60
43.
go back to reference Wu Q, Ye Y, Zhang H, Ng MK, Ho SS (2014) ForesTexter: an efficient random forest algorithm for imbalanced text categorization. Knowl Based Syst 67(3):105–116CrossRef Wu Q, Ye Y, Zhang H, Ng MK, Ho SS (2014) ForesTexter: an efficient random forest algorithm for imbalanced text categorization. Knowl Based Syst 67(3):105–116CrossRef
44.
45.
go back to reference Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12(10):2825–2830MathSciNetMATH Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12(10):2825–2830MathSciNetMATH
46.
go back to reference Goodfellow I, Courville A, Bengio Y (2012) Large-scale feature learning with spike-and-slab sparse coding. Proceedings of the 29th International Conference on Machine Learning, pp 1439–1446 Goodfellow I, Courville A, Bengio Y (2012) Large-scale feature learning with spike-and-slab sparse coding. Proceedings of the 29th International Conference on Machine Learning, pp 1439–1446
47.
go back to reference Zhang HJ, Chow TWS, Wu QMJ (2016) Organizing books and authors by multilayer SOM. IEEE Trans Neural Netw Learn Syst 27(12):2537CrossRef Zhang HJ, Chow TWS, Wu QMJ (2016) Organizing books and authors by multilayer SOM. IEEE Trans Neural Netw Learn Syst 27(12):2537CrossRef
48.
go back to reference Zhang HJ, Li J, Ji Y, Yue H (2017) Understanding subtitles by character-level sequence-to-sequence learning. IEEE Trans Ind Inform 13(2):616–624CrossRef Zhang HJ, Li J, Ji Y, Yue H (2017) Understanding subtitles by character-level sequence-to-sequence learning. IEEE Trans Ind Inform 13(2):616–624CrossRef
49.
go back to reference Zhang HJ, Cao X, Ho JKL, Chow TWS (2016) Object-level video advertising: an optimization framework. IEEE Trans Ind Inform 13(2):520–531CrossRef Zhang HJ, Cao X, Ho JKL, Chow TWS (2016) Object-level video advertising: an optimization framework. IEEE Trans Ind Inform 13(2):520–531CrossRef
50.
go back to reference Oyedotun OK, Khashman A (2016) Deep learning in vision-based static hand gesture recognition. Neural Comput Appl 2016:1–11 Oyedotun OK, Khashman A (2016) Deep learning in vision-based static hand gesture recognition. Neural Comput Appl 2016:1–11
Metadata
Title
A novel feature extraction methodology for sentiment analysis of product reviews
Authors
Xin Chen
Yun Xue
Hongya Zhao
Xin Lu
Xiaohui Hu
Zhihao Ma
Publication date
21-04-2018
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 10/2019
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-018-3477-2

Other articles of this Issue 10/2019

Neural Computing and Applications 10/2019 Go to the issue

Premium Partner