Skip to main content
Top

2017 | OriginalPaper | Chapter

Tag Me a Label with Multi-arm: Active Learning for Telugu Sentiment Analysis

Authors : Sandeep Sricharan Mukku, Subba Reddy Oota, Radhika Mamidi

Published in: Big Data Analytics and Knowledge Discovery

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Sentiment Analysis is one of the most active research areas in natural language processing and an extensively studied problem in data mining, web mining and text mining for English language. With the proliferation of social media these days, data is widely increasing in regional languages along with English. Telugu is one such regional language with abundant data available in social media, but it’s hard to find a labeled training set as human annotation is time-consuming and cost-ineffective. To address this issue, in this paper the practicality of active learning for Telugu sentiment analysis is investigated. We built a hybrid approach by combining different query selection strategy frameworks to increase more accurate training data instances with limited labeled data. Using a set of classifiers like SVM, XGBoost, and Gradient Boosted Trees (GBT), we achieved promising results with minimal error rate.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Settles, B.: Active learning literature survey. Technical report (2010) Settles, B.: Active learning literature survey. Technical report (2010)
2.
go back to reference Lewis, D.D.: A sequential algorithm for training text classifiers: corrigendum and additional data, pp. 13–19 (1995) Lewis, D.D.: A sequential algorithm for training text classifiers: corrigendum and additional data, pp. 13–19 (1995)
3.
go back to reference Kolar Rajagopal, A., Subramanian, R., Ricci, E., Vieriu, R.L., Lanz, O., Kalpathi, R., Sebe, N.: Exploring transfer learning approaches for head pose classification from multi-view surveillance images. Int. J. Comput. Vision 109, 146–167 (2014)CrossRef Kolar Rajagopal, A., Subramanian, R., Ricci, E., Vieriu, R.L., Lanz, O., Kalpathi, R., Sebe, N.: Exploring transfer learning approaches for head pose classification from multi-view surveillance images. Int. J. Comput. Vision 109, 146–167 (2014)CrossRef
4.
go back to reference Huang, S.J., Jin, R., Zhou, Z.H.: Active learning by querying informative and representative examples. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems, pp. 892–900 (2010) Huang, S.J., Jin, R., Zhou, Z.H.: Active learning by querying informative and representative examples. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems, pp. 892–900 (2010)
5.
go back to reference Settles, B., Craven, M.: An analysis of active learning strategies for sequence labeling tasks. In: EMNLP 2008, pp. 1070–1079. Association for Computational Linguistics (2008) Settles, B., Craven, M.: An analysis of active learning strategies for sequence labeling tasks. In: EMNLP 2008, pp. 1070–1079. Association for Computational Linguistics (2008)
6.
go back to reference Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 287–294 (1992) Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 287–294 (1992)
7.
go back to reference Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)MATH Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)MATH
8.
go back to reference Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016) Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)
9.
go back to reference Ganjisaffar, Y., Caruana, R., Lopes, C.V.: Bagging gradient-boosted trees for high precision, low variance ranking models. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 85–94. ACM (2011) Ganjisaffar, Y., Caruana, R., Lopes, C.V.: Bagging gradient-boosted trees for high precision, low variance ranking models. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 85–94. ACM (2011)
10.
go back to reference Motlani, R., Lalwani, H., Shrivastava, M., Sharma, D.M.: Developing part-of-speech tagger for a resource poor language: Sindhi Motlani, R., Lalwani, H., Shrivastava, M., Sharma, D.M.: Developing part-of-speech tagger for a resource poor language: Sindhi
11.
go back to reference Gad-Elrab, M.H., Yosef, M.A., Weikum, G.: Named entity disambiguation for resource-poor languages. In: Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval, ESAIR 2015, pp. 29–34 (2015) Gad-Elrab, M.H., Yosef, M.A., Weikum, G.: Named entity disambiguation for resource-poor languages. In: Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval, ESAIR 2015, pp. 29–34 (2015)
12.
go back to reference Gasser, M.: Expanding the lexicon for a resource-poor language using a morphological analyzer and a web crawler. In: Proceedings of the International Conference on Language Resources and Evaluation, LREC 2010, Valletta, Malta, 17–23 May 2010 Gasser, M.: Expanding the lexicon for a resource-poor language using a morphological analyzer and a web crawler. In: Proceedings of the International Conference on Language Resources and Evaluation, LREC 2010, Valletta, Malta, 17–23 May 2010
13.
go back to reference Sravanthi, M.C., Prathyusha, K., Mamidi, R.: A Dialogue System for Telugu, a Resource-Poor Language, pp. 364–374 (2015) Sravanthi, M.C., Prathyusha, K., Mamidi, R.: A Dialogue System for Telugu, a Resource-Poor Language, pp. 364–374 (2015)
15.
go back to reference Freund, Y., Seung, H.S., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Mach. Learn. 28, 133–168 (1997)CrossRefMATH Freund, Y., Seung, H.S., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Mach. Learn. 28, 133–168 (1997)CrossRefMATH
16.
go back to reference Cohn, D., Atlas, L., Ladner, R.: Improving generalization with active learning. Mach. Learn. 15, 201–221 (1994) Cohn, D., Atlas, L., Ladner, R.: Improving generalization with active learning. Mach. Learn. 15, 201–221 (1994)
17.
go back to reference Chu, W., Zinkevich, M., Li, L., Thomas, A., Tseng, B.: Unbiased online active learning in data streams. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 195–203 (2011) Chu, W., Zinkevich, M., Li, L., Thomas, A., Tseng, B.: Unbiased online active learning in data streams. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 195–203 (2011)
18.
go back to reference Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2001)MATH Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2001)MATH
19.
go back to reference Campbell, C., Cristianini, N., Smola, A., et al.: Query learning with large margin classifiers. In: ICML, pp. 111–118 (2000) Campbell, C., Cristianini, N., Smola, A., et al.: Query learning with large margin classifiers. In: ICML, pp. 111–118 (2000)
20.
go back to reference Kremer, J., Steenstrup Pedersen, K., Igel, C.: Active learning with support vector machines. Wiley Interdiscip. Rev. Data Min. Knowl. Disc. 4(4), 313–326 (2014)CrossRef Kremer, J., Steenstrup Pedersen, K., Igel, C.: Active learning with support vector machines. Wiley Interdiscip. Rev. Data Min. Knowl. Disc. 4(4), 313–326 (2014)CrossRef
21.
go back to reference Fu, Y., Zhu, X., Li, B.: A survey on instance selection for active learning (2012) Fu, Y., Zhu, X., Li, B.: A survey on instance selection for active learning (2012)
22.
go back to reference Reitmaier, T., Sick, B.: Let us know your decision: pool-based active training of a generative classifier with the selection strategy 4DS. Inf. Sci. 230, 106–131 (2013)CrossRef Reitmaier, T., Sick, B.: Let us know your decision: pool-based active training of a generative classifier with the selection strategy 4DS. Inf. Sci. 230, 106–131 (2013)CrossRef
23.
24.
go back to reference Settles, B.: Curious machines: active learning with structured instances. ProQuest (2008) Settles, B.: Curious machines: active learning with structured instances. ProQuest (2008)
25.
go back to reference Zhou, S., Chen, Q., Wang, X.: Active deep learning method for semi-supervised sentiment classification. Neurocomputing 120, 536–546 (2013)CrossRef Zhou, S., Chen, Q., Wang, X.: Active deep learning method for semi-supervised sentiment classification. Neurocomputing 120, 536–546 (2013)CrossRef
26.
go back to reference Li, S., Ju, S., Zhou, G., Li, X.: Active learning for imbalanced sentiment classification. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Association for Computational Linguistics, pp. 139–148 (2012) Li, S., Ju, S., Zhou, G., Li, X.: Active learning for imbalanced sentiment classification. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Association for Computational Linguistics, pp. 139–148 (2012)
27.
go back to reference Boiy, E., Moens, M.F.: A machine learning approach to sentiment analysis in multilingual web texts. Inf. Retrieval 12(5), 526–558 (2009)CrossRef Boiy, E., Moens, M.F.: A machine learning approach to sentiment analysis in multilingual web texts. Inf. Retrieval 12(5), 526–558 (2009)CrossRef
28.
go back to reference Mukku, S.S., Choudhary, N., Mamidi, R.: Enhanced sentiment classification of Telugu text using ML techniques. In: Proceedings of 25th International Joint Conference on Artificial Intelligence, p. 29 (2016) Mukku, S.S., Choudhary, N., Mamidi, R.: Enhanced sentiment classification of Telugu text using ML techniques. In: Proceedings of 25th International Joint Conference on Artificial Intelligence, p. 29 (2016)
29.
go back to reference Patra, B.G., Das, D., Das, A., Prasath, R.: Shared task on sentiment analysis in Indian languages (SAIL) tweets - an overview. In: Prasath, R., Vuppala, A.K., Kathirvalavakumar, T. (eds.) MIKE 2015. LNCS, vol. 9468, pp. 650–655. Springer, Cham (2015). doi:10.1007/978-3-319-26832-3_61 CrossRef Patra, B.G., Das, D., Das, A., Prasath, R.: Shared task on sentiment analysis in Indian languages (SAIL) tweets - an overview. In: Prasath, R., Vuppala, A.K., Kathirvalavakumar, T. (eds.) MIKE 2015. LNCS, vol. 9468, pp. 650–655. Springer, Cham (2015). doi:10.​1007/​978-3-319-26832-3_​61 CrossRef
30.
go back to reference Gupta, R., Goyal, P., Diwakar, S.: Transliteration among Indian languages using WX notation. In: KONVENS, pp. 147–150 (2010) Gupta, R., Goyal, P., Diwakar, S.: Transliteration among Indian languages using WX notation. In: KONVENS, pp. 147–150 (2010)
31.
go back to reference Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
32.
go back to reference Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:​1301.​3781 (2013)
33.
go back to reference van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)MATH van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)MATH
34.
go back to reference Krishnamurti, B., Gwynn, J.P.L.: A Grammar of Modern Telugu. Oxford University Press, New York (1985) Krishnamurti, B., Gwynn, J.P.L.: A Grammar of Modern Telugu. Oxford University Press, New York (1985)
35.
go back to reference Krishnamurthi, B.: Telugu verbal bases: a comparative and descriptive study (1961) Krishnamurthi, B.: Telugu verbal bases: a comparative and descriptive study (1961)
36.
go back to reference Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents
37.
go back to reference Mahajan, A., Teneketzis, D.: Multi-armed bandit problems. In: Hero, A.O., Castañón, D.A., Cochran, D., Kastella, K. (eds.) Foundations and Applications of Sensor Management, pp. 121–151. Springer, Boston (2008)CrossRef Mahajan, A., Teneketzis, D.: Multi-armed bandit problems. In: Hero, A.O., Castañón, D.A., Cochran, D., Kastella, K. (eds.) Foundations and Applications of Sensor Management, pp. 121–151. Springer, Boston (2008)CrossRef
38.
go back to reference Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, pp. 233–240 (2006) Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, pp. 233–240 (2006)
39.
go back to reference Seewald, A.K.: Meta-learning for stacked classification. Audiology 24(226), 69 Seewald, A.K.: Meta-learning for stacked classification. Audiology 24(226), 69
Metadata
Title
Tag Me a Label with Multi-arm: Active Learning for Telugu Sentiment Analysis
Authors
Sandeep Sricharan Mukku
Subba Reddy Oota
Radhika Mamidi
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-64283-3_26

Premium Partner