Skip to main content
Top
Published in:
Cover of the book

2018 | OriginalPaper | Chapter

A Comparison Among Significance Tests and Other Feature Building Methods for Sentiment Analysis: A First Study

Authors : Raksha Sharma, Dibyendu Mondal, Pushpak Bhattacharyya

Published in: Computational Linguistics and Intelligent Text Processing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Words that participate in the sentiment (positive or negative) classification decision are known as significant words for sentiment classification. Identification of such significant words as features from the corpus reduces the amount of irrelevant information in the feature set under supervised sentiment classification settings. In this paper, we conceptually study and compare various types of feature building methods, viz., unigrams, TFIDF, Relief, Delta-TFIDF, \(\chi ^2\) test and Welch’s t-test for sentiment analysis task. Unigrams and TFIDF are the classic ways of feature building from the corpus. Relief, Delta-TFIDF and \(\chi ^2\) test have recently attracted much attention for their potential use as feature building methods in sentiment analysis. On the contrary, t-test is the least explored for the identification of significant words from the corpus as features.
We show the effectiveness of significance tests over other feature building methods for three types of sentiment analysis tasks, viz., in-domain, cross-domain and cross-lingual. Delta-TFIDF, \(\chi ^2\) test and Welch’s t-test compute the significance of the word for classification in the corpus, whereas unigrams, TFIDF and Relief do not observe the significance of the word for classification. Furthermore, significance tests can be divided into two categories, bag-of-words-based test and distribution-based test. Bag-of-words-based test observes the total count of the word in different classes to find significance of the word, while distribution-based test observes the distribution of the word. In this paper, we substantiate that the distribution-based Welch’s t-test is more accurate than bag-of-words-based \(\chi ^2\) test and Delta-TFIDF in identification of significant words from the corpus.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
We also observed the performance of unigrams with the frequency in the document as feature value, but we did not find any improvement in SA accuracy over the unigram’s presence.
 
3
More detail about the implementation of Relief can be obtained from Liu and Hiroshi [23].
 
4
\(\chi ^2\) value and P-value have inverse correlation, hence a high \(\chi ^2\) value corresponds to a low P-value. The correlation table is available at: http://​sites.​stat.​psu.​edu/​~mga/​401/​tables/​Chi-square-table.​pdf.
 
5
t value and P-value have inverse correlation, hence a high t value corresponds to a low P-value. The correlation table is available at: http://​www.​sjsu.​edu/​faculty/​gerstman/​StatPrimer/​t-table.​pdf.
 
6
The threshold 0.05 on P-value is a standard value in statistics as it gives \(95\%\) confidence in the decision.
 
8
Available at: http://​www.​cs.​jhu.​edu/​~mdredze/​datasets/​sentiment/​index2.​html. This dataset has one more domain, that is, DVD domain. The contents of reviews in the DVD domain are very similar to the reviews in the movie domain; hence, to avoid redundancy, we have not reported results with the DVD domain.
 
9
A threshold on score is set empirically to filter out the words about which tests are not very confident, where the low confidence is visible from the low score assigned by Relief.
 
11
We use SVM package libsvm, which is available in java-based WEKA toolkit for machine learning. Available at: http://​www.​cs.​waikato.​ac.​nz/​ml/​weka/​downloading.​html.
 
12
Application of significance test (Delta-TFIDF or \(\chi ^2\) test or t-test) reduces the feature set size substantially, which stimulates a less computationally expensive SA system in comparison to unigrams, TFIDF and Relief.
 
13
Since movie domain has the highest average length of the document (review), we have selected movie domain to show the variation among confusion matrices obtained with different feature building methods.
 
14
CLSA results are reported using the four different languages, viz., English (en), French (fr), German (de) and Russian (ru). The more detail about the dataset is given in Table 5.
 
15
In all CLSA experiments, training data is obtained by translating source language data, while test data is taken from the available manually tagged non-translated data.
 
17
For pairs en\(\rightarrow \)en, fr\(\rightarrow \)fr, de\(\rightarrow \)de and ru\(\rightarrow \)ru, source and target languages are the same and training data is not the translated data, it is the original manually tagged dataset in the language.
 
18
In case of in-language pairs, for example, en\(\rightarrow \)en we assumed a BLEU score of 100 considering that this pair has \(100\%\) correct translation as there is no translation process involved.
 
19
Here, the P-value for the t value is less than 0.05. Significance of difference in accuracy is observed at \(P<0.05\), which gives \(95\%\) confidence in decision.
 
Literature
1.
go back to reference Oakes, M., Gaaizauskas, R., Fowkes, H., Jonsson, A., Wan, V., Beaulieu, M.: A method based on the chi-square test for document classification. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 440–441. ACM (2001) Oakes, M., Gaaizauskas, R., Fowkes, H., Jonsson, A., Wan, V., Beaulieu, M.: A method based on the chi-square test for document classification. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 440–441. ACM (2001)
2.
go back to reference Jin, X., Xu, A., Bie, R., Guo, P.: Machine learning techniques and chi-square feature selection for cancer classification using SAGE gene expression profiles. In: Li, J., Yang, Q., Tan, A.-H. (eds.) BioDM 2006. LNCS, vol. 3916, pp. 106–115. Springer, Heidelberg (2006). https://doi.org/10.1007/11691730_11CrossRef Jin, X., Xu, A., Bie, R., Guo, P.: Machine learning techniques and chi-square feature selection for cancer classification using SAGE gene expression profiles. In: Li, J., Yang, Q., Tan, A.-H. (eds.) BioDM 2006. LNCS, vol. 3916, pp. 106–115. Springer, Heidelberg (2006). https://​doi.​org/​10.​1007/​11691730_​11CrossRef
3.
go back to reference Moh’d, A., Mesleh, A.: Chi square feature extraction based SVMS arabic language text categorization system. J. Comput. Sci. 3, 430–435 (2007)CrossRef Moh’d, A., Mesleh, A.: Chi square feature extraction based SVMS arabic language text categorization system. J. Comput. Sci. 3, 430–435 (2007)CrossRef
4.
go back to reference Kilgarriff, A.: Comparing corpora. Int. J. Corpus Linguist. 6, 97–133 (2001)CrossRef Kilgarriff, A.: Comparing corpora. Int. J. Corpus Linguist. 6, 97–133 (2001)CrossRef
5.
go back to reference Paquot, M., Bestgen, Y.: Distinctive words in academic writing: a comparison of three statistical tests for keyword extraction. Lang. Comput. 68, 247–269 (2009) Paquot, M., Bestgen, Y.: Distinctive words in academic writing: a comparison of three statistical tests for keyword extraction. Lang. Comput. 68, 247–269 (2009)
6.
go back to reference Lijffijt, J., Nevalainen, T., Säily, T., Papapetrou, P., Puolamäki, K., Mannila, H.: Significance testing of word frequencies in corpora. Digital Scholarsh. Humanit. (2014) (fqu064) Lijffijt, J., Nevalainen, T., Säily, T., Papapetrou, P., Puolamäki, K., Mannila, H.: Significance testing of word frequencies in corpora. Digital Scholarsh. Humanit. (2014) (fqu064)
7.
go back to reference Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: a deep learning approach. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 513–520 (2011) Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: a deep learning approach. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 513–520 (2011)
8.
go back to reference Zhou, J.T., Pan, S.J., Tsang, I.W., Yan, Y.: Hybrid heterogeneous transfer learning through deep learning. AAAI, 2213–2220 (2014) Zhou, J.T., Pan, S.J., Tsang, I.W., Yan, Y.: Hybrid heterogeneous transfer learning through deep learning. AAAI, 2213–2220 (2014)
9.
go back to reference Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, pp. 79–86 (2002) Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, pp. 79–86 (2002)
10.
go back to reference Meyer, T.A., Whateley, B.: Spambayes: effective open-source, bayesian based, email classification system. In: CEAS. Citeseer (2004) Meyer, T.A., Whateley, B.: Spambayes: effective open-source, bayesian based, email classification system. In: CEAS. Citeseer (2004)
11.
go back to reference Kanayama, H., Nasukawa, T.: Fully automatic lexicon expansion for domain-oriented sentiment analysis. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, pp. 355–363 (2006) Kanayama, H., Nasukawa, T.: Fully automatic lexicon expansion for domain-oriented sentiment analysis. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, pp. 355–363 (2006)
12.
go back to reference Cheng, A., Zhulyn, O.: A system for multilingual sentiment learning on large data sets. In: Proceedings of International Conference on Computational Linguistics, pp. 577–592 (2012) Cheng, A., Zhulyn, O.: A system for multilingual sentiment learning on large data sets. In: Proceedings of International Conference on Computational Linguistics, pp. 577–592 (2012)
13.
go back to reference Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of massive datasets. Cambridge University Press, Cambridge (2014) Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of massive datasets. Cambridge University Press, Cambridge (2014)
14.
go back to reference Oakes, M.P., Farrow, M.: Use of the chi-squared test to examine vocabulary differences in english language corpora representing seven different countries. Lit. Linguist. Comput. 22, 85–99 (2007)CrossRef Oakes, M.P., Farrow, M.: Use of the chi-squared test to examine vocabulary differences in english language corpora representing seven different countries. Lit. Linguist. Comput. 22, 85–99 (2007)CrossRef
15.
go back to reference Al-Harbi, S., Almuhareb, A., Al-Thubaity, A., Khorsheed, M., Al-Rajeh, A.: Automatic Arabic text classification (2008) Al-Harbi, S., Almuhareb, A., Al-Thubaity, A., Khorsheed, M., Al-Rajeh, A.: Automatic Arabic text classification (2008)
16.
go back to reference Rayson, P., Garside, R.: Comparing corpora using frequency profiling. In: Proceedings of the workshop on Comparing Corpora, Association for Computational Linguistics, pp. 1–6 (2000) Rayson, P., Garside, R.: Comparing corpora using frequency profiling. In: Proceedings of the workshop on Comparing Corpora, Association for Computational Linguistics, pp. 1–6 (2000)
17.
go back to reference Sharma, R., Bhattacharyya, P.: Detecting domain dedicated polar words. In: Proceedings of the International Joint Conference on Natural Language Processing, pp. 661–666 (2013) Sharma, R., Bhattacharyya, P.: Detecting domain dedicated polar words. In: Proceedings of the International Joint Conference on Natural Language Processing, pp. 661–666 (2013)
18.
go back to reference Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. AAAI 2, 129–134 (1992) Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. AAAI 2, 129–134 (1992)
19.
go back to reference Martineau, J., Finin, T.: Delta TFIDF: an improved feature space for sentiment analysis. ICWSM 9, 106 (2009) Martineau, J., Finin, T.: Delta TFIDF: an improved feature space for sentiment analysis. ICWSM 9, 106 (2009)
20.
go back to reference Martineau, J., Finin, T., Joshi, A., Patel, S.: Improving binary classification on text problems using differential word features. In: Proceedings of the 18th ACM conference on Information and knowledge management, pp. 2019–2024. ACM (2009) Martineau, J., Finin, T., Joshi, A., Patel, S.: Improving binary classification on text problems using differential word features. In: Proceedings of the 18th ACM conference on Information and knowledge management, pp. 2019–2024. ACM (2009)
21.
go back to reference Wu, H.C., Luk, R.W.P., Wong, K.F., Kwok, K.L.: Interpreting TF-IDF term weights as making relevance decisions. ACM Trans. Inf. Syst. (TOIS) 26, 13 (2008)CrossRef Wu, H.C., Luk, R.W.P., Wong, K.F., Kwok, K.L.: Interpreting TF-IDF term weights as making relevance decisions. ACM Trans. Inf. Syst. (TOIS) 26, 13 (2008)CrossRef
22.
go back to reference Čehovin, L., Bosnić, Z.: Empirical evaluation of feature selection methods in classification. Intell. Data Anal. 14, 265–281 (2010) Čehovin, L., Bosnić, Z.: Empirical evaluation of feature selection methods in classification. Intell. Data Anal. 14, 265–281 (2010)
23.
go back to reference Liu, H., Motoda, H.: Computational methods of feature selection. CRC Press, Boca Raton (2007) Liu, H., Motoda, H.: Computational methods of feature selection. CRC Press, Boca Raton (2007)
24.
go back to reference Pang, B., Lee, L.: A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of Association for Computational Linguistics, pp. 271–279 (2004) Pang, B., Lee, L.: A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of Association for Computational Linguistics, pp. 271–279 (2004)
25.
go back to reference Blitzer, J., Dredze, M., Pereira, F., et al.: Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: Proceedings of Association for Computational Linguistics, pp. 440–447 (2007) Blitzer, J., Dredze, M., Pereira, F., et al.: Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: Proceedings of Association for Computational Linguistics, pp. 440–447 (2007)
27.
go back to reference Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2001)MATH Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2001)MATH
28.
go back to reference Sharma, R., Bhattacharyya, P.: Domain sentiment matters: a two stage sentiment analyzer. In: Proceedings of the International Conference on Natural Language Processing (2015) Sharma, R., Bhattacharyya, P.: Domain sentiment matters: a two stage sentiment analyzer. In: Proceedings of the International Conference on Natural Language Processing (2015)
29.
go back to reference Pan, S.J., Ni, X., Sun, J.T., Yang, Q., Chen, Z.: Cross-domain sentiment classification via spectral feature alignment. In: Proceedings of the 19th International Conference on World Wide Web, pp. 751–760. ACM (2010) Pan, S.J., Ni, X., Sun, J.T., Yang, Q., Chen, Z.: Cross-domain sentiment classification via spectral feature alignment. In: Proceedings of the 19th International Conference on World Wide Web, pp. 751–760. ACM (2010)
30.
go back to reference Wan, X.: Co-training for cross-lingual sentiment classification. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, vol. 1, pp. 235–243. Association for Computational Linguistics (2009) Wan, X.: Co-training for cross-lingual sentiment classification. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, vol. 1, pp. 235–243. Association for Computational Linguistics (2009)
31.
go back to reference Wei, B., Pal, C.: Cross lingual adaptation: an experiment on sentiment classifications. In: Proceedings of the ACL 2010 Conference Short Papers, Association for Computational Linguistics, pp. 258–262 (2010) Wei, B., Pal, C.: Cross lingual adaptation: an experiment on sentiment classifications. In: Proceedings of the ACL 2010 Conference Short Papers, Association for Computational Linguistics, pp. 258–262 (2010)
32.
go back to reference Koehn, P.: Europarl: a parallel corpus for statistical machine translation. MT Summit. 5, 79–86 (2005) Koehn, P.: Europarl: a parallel corpus for statistical machine translation. MT Summit. 5, 79–86 (2005)
33.
go back to reference Ng, V., Dasgupta, S., Arifin, S.: Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 611–618. Association for Computational Linguistics (2006) Ng, V., Dasgupta, S., Arifin, S.: Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 611–618. Association for Computational Linguistics (2006)
34.
go back to reference Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24, 513–523 (1988)CrossRef Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24, 513–523 (1988)CrossRef
35.
go back to reference Lin, Y., Zhang, J., Wang, X., Zhou, A.: An information theoretic approach to sentiment polarity classification. In: Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality, pp. 35–40. ACM (2012) Lin, Y., Zhang, J., Wang, X., Zhou, A.: An information theoretic approach to sentiment polarity classification. In: Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality, pp. 35–40. ACM (2012)
36.
go back to reference Demiroz, G., Yanikoglu, B., Tapucu, D., Saygin, Y.: Learning domain-specific polarity lexicons. In: 2012 IEEE 12th International Conference on Data Mining Workshops, pp. 674–679. IEEE (2012) Demiroz, G., Yanikoglu, B., Tapucu, D., Saygin, Y.: Learning domain-specific polarity lexicons. In: 2012 IEEE 12th International Conference on Data Mining Workshops, pp. 674–679. IEEE (2012)
37.
go back to reference Habernal, I., Ptácek, T., Steinberger, J.: Sentiment analysis in czech social media using supervised machine learning. In: Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 65–74 (2013) Habernal, I., Ptácek, T., Steinberger, J.: Sentiment analysis in czech social media using supervised machine learning. In: Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 65–74 (2013)
Metadata
Title
A Comparison Among Significance Tests and Other Feature Building Methods for Sentiment Analysis: A First Study
Authors
Raksha Sharma
Dibyendu Mondal
Pushpak Bhattacharyya
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-77116-8_1

Premium Partner