Skip to main content

2017 | OriginalPaper | Buchkapitel

Tag Me a Label with Multi-arm: Active Learning for Telugu Sentiment Analysis

verfasst von : Sandeep Sricharan Mukku, Subba Reddy Oota, Radhika Mamidi

Erschienen in: Big Data Analytics and Knowledge Discovery

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Sentiment Analysis is one of the most active research areas in natural language processing and an extensively studied problem in data mining, web mining and text mining for English language. With the proliferation of social media these days, data is widely increasing in regional languages along with English. Telugu is one such regional language with abundant data available in social media, but it’s hard to find a labeled training set as human annotation is time-consuming and cost-ineffective. To address this issue, in this paper the practicality of active learning for Telugu sentiment analysis is investigated. We built a hybrid approach by combining different query selection strategy frameworks to increase more accurate training data instances with limited labeled data. Using a set of classifiers like SVM, XGBoost, and Gradient Boosted Trees (GBT), we achieved promising results with minimal error rate.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Settles, B.: Active learning literature survey. Technical report (2010) Settles, B.: Active learning literature survey. Technical report (2010)
2.
Zurück zum Zitat Lewis, D.D.: A sequential algorithm for training text classifiers: corrigendum and additional data, pp. 13–19 (1995) Lewis, D.D.: A sequential algorithm for training text classifiers: corrigendum and additional data, pp. 13–19 (1995)
3.
Zurück zum Zitat Kolar Rajagopal, A., Subramanian, R., Ricci, E., Vieriu, R.L., Lanz, O., Kalpathi, R., Sebe, N.: Exploring transfer learning approaches for head pose classification from multi-view surveillance images. Int. J. Comput. Vision 109, 146–167 (2014)CrossRef Kolar Rajagopal, A., Subramanian, R., Ricci, E., Vieriu, R.L., Lanz, O., Kalpathi, R., Sebe, N.: Exploring transfer learning approaches for head pose classification from multi-view surveillance images. Int. J. Comput. Vision 109, 146–167 (2014)CrossRef
4.
Zurück zum Zitat Huang, S.J., Jin, R., Zhou, Z.H.: Active learning by querying informative and representative examples. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems, pp. 892–900 (2010) Huang, S.J., Jin, R., Zhou, Z.H.: Active learning by querying informative and representative examples. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems, pp. 892–900 (2010)
5.
Zurück zum Zitat Settles, B., Craven, M.: An analysis of active learning strategies for sequence labeling tasks. In: EMNLP 2008, pp. 1070–1079. Association for Computational Linguistics (2008) Settles, B., Craven, M.: An analysis of active learning strategies for sequence labeling tasks. In: EMNLP 2008, pp. 1070–1079. Association for Computational Linguistics (2008)
6.
Zurück zum Zitat Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 287–294 (1992) Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 287–294 (1992)
7.
Zurück zum Zitat Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)MATH Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)MATH
8.
Zurück zum Zitat Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016) Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)
9.
Zurück zum Zitat Ganjisaffar, Y., Caruana, R., Lopes, C.V.: Bagging gradient-boosted trees for high precision, low variance ranking models. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 85–94. ACM (2011) Ganjisaffar, Y., Caruana, R., Lopes, C.V.: Bagging gradient-boosted trees for high precision, low variance ranking models. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 85–94. ACM (2011)
10.
Zurück zum Zitat Motlani, R., Lalwani, H., Shrivastava, M., Sharma, D.M.: Developing part-of-speech tagger for a resource poor language: Sindhi Motlani, R., Lalwani, H., Shrivastava, M., Sharma, D.M.: Developing part-of-speech tagger for a resource poor language: Sindhi
11.
Zurück zum Zitat Gad-Elrab, M.H., Yosef, M.A., Weikum, G.: Named entity disambiguation for resource-poor languages. In: Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval, ESAIR 2015, pp. 29–34 (2015) Gad-Elrab, M.H., Yosef, M.A., Weikum, G.: Named entity disambiguation for resource-poor languages. In: Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval, ESAIR 2015, pp. 29–34 (2015)
12.
Zurück zum Zitat Gasser, M.: Expanding the lexicon for a resource-poor language using a morphological analyzer and a web crawler. In: Proceedings of the International Conference on Language Resources and Evaluation, LREC 2010, Valletta, Malta, 17–23 May 2010 Gasser, M.: Expanding the lexicon for a resource-poor language using a morphological analyzer and a web crawler. In: Proceedings of the International Conference on Language Resources and Evaluation, LREC 2010, Valletta, Malta, 17–23 May 2010
13.
Zurück zum Zitat Sravanthi, M.C., Prathyusha, K., Mamidi, R.: A Dialogue System for Telugu, a Resource-Poor Language, pp. 364–374 (2015) Sravanthi, M.C., Prathyusha, K., Mamidi, R.: A Dialogue System for Telugu, a Resource-Poor Language, pp. 364–374 (2015)
15.
Zurück zum Zitat Freund, Y., Seung, H.S., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Mach. Learn. 28, 133–168 (1997)CrossRefMATH Freund, Y., Seung, H.S., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Mach. Learn. 28, 133–168 (1997)CrossRefMATH
16.
Zurück zum Zitat Cohn, D., Atlas, L., Ladner, R.: Improving generalization with active learning. Mach. Learn. 15, 201–221 (1994) Cohn, D., Atlas, L., Ladner, R.: Improving generalization with active learning. Mach. Learn. 15, 201–221 (1994)
17.
Zurück zum Zitat Chu, W., Zinkevich, M., Li, L., Thomas, A., Tseng, B.: Unbiased online active learning in data streams. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 195–203 (2011) Chu, W., Zinkevich, M., Li, L., Thomas, A., Tseng, B.: Unbiased online active learning in data streams. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 195–203 (2011)
18.
Zurück zum Zitat Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2001)MATH Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2001)MATH
19.
Zurück zum Zitat Campbell, C., Cristianini, N., Smola, A., et al.: Query learning with large margin classifiers. In: ICML, pp. 111–118 (2000) Campbell, C., Cristianini, N., Smola, A., et al.: Query learning with large margin classifiers. In: ICML, pp. 111–118 (2000)
20.
Zurück zum Zitat Kremer, J., Steenstrup Pedersen, K., Igel, C.: Active learning with support vector machines. Wiley Interdiscip. Rev. Data Min. Knowl. Disc. 4(4), 313–326 (2014)CrossRef Kremer, J., Steenstrup Pedersen, K., Igel, C.: Active learning with support vector machines. Wiley Interdiscip. Rev. Data Min. Knowl. Disc. 4(4), 313–326 (2014)CrossRef
21.
Zurück zum Zitat Fu, Y., Zhu, X., Li, B.: A survey on instance selection for active learning (2012) Fu, Y., Zhu, X., Li, B.: A survey on instance selection for active learning (2012)
22.
Zurück zum Zitat Reitmaier, T., Sick, B.: Let us know your decision: pool-based active training of a generative classifier with the selection strategy 4DS. Inf. Sci. 230, 106–131 (2013)CrossRef Reitmaier, T., Sick, B.: Let us know your decision: pool-based active training of a generative classifier with the selection strategy 4DS. Inf. Sci. 230, 106–131 (2013)CrossRef
23.
Zurück zum Zitat Kottke, D., Krempl, G., Spiliopoulou, M.: Probabilistic active learning in datastreams. In: Fromont, E., Bie, T., Leeuwen, M. (eds.) IDA 2015. LNCS, vol. 9385, pp. 145–157. Springer, Cham (2015). doi:10.1007/978-3-319-24465-5_13 CrossRef Kottke, D., Krempl, G., Spiliopoulou, M.: Probabilistic active learning in datastreams. In: Fromont, E., Bie, T., Leeuwen, M. (eds.) IDA 2015. LNCS, vol. 9385, pp. 145–157. Springer, Cham (2015). doi:10.​1007/​978-3-319-24465-5_​13 CrossRef
24.
Zurück zum Zitat Settles, B.: Curious machines: active learning with structured instances. ProQuest (2008) Settles, B.: Curious machines: active learning with structured instances. ProQuest (2008)
25.
Zurück zum Zitat Zhou, S., Chen, Q., Wang, X.: Active deep learning method for semi-supervised sentiment classification. Neurocomputing 120, 536–546 (2013)CrossRef Zhou, S., Chen, Q., Wang, X.: Active deep learning method for semi-supervised sentiment classification. Neurocomputing 120, 536–546 (2013)CrossRef
26.
Zurück zum Zitat Li, S., Ju, S., Zhou, G., Li, X.: Active learning for imbalanced sentiment classification. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Association for Computational Linguistics, pp. 139–148 (2012) Li, S., Ju, S., Zhou, G., Li, X.: Active learning for imbalanced sentiment classification. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Association for Computational Linguistics, pp. 139–148 (2012)
27.
Zurück zum Zitat Boiy, E., Moens, M.F.: A machine learning approach to sentiment analysis in multilingual web texts. Inf. Retrieval 12(5), 526–558 (2009)CrossRef Boiy, E., Moens, M.F.: A machine learning approach to sentiment analysis in multilingual web texts. Inf. Retrieval 12(5), 526–558 (2009)CrossRef
28.
Zurück zum Zitat Mukku, S.S., Choudhary, N., Mamidi, R.: Enhanced sentiment classification of Telugu text using ML techniques. In: Proceedings of 25th International Joint Conference on Artificial Intelligence, p. 29 (2016) Mukku, S.S., Choudhary, N., Mamidi, R.: Enhanced sentiment classification of Telugu text using ML techniques. In: Proceedings of 25th International Joint Conference on Artificial Intelligence, p. 29 (2016)
29.
Zurück zum Zitat Patra, B.G., Das, D., Das, A., Prasath, R.: Shared task on sentiment analysis in Indian languages (SAIL) tweets - an overview. In: Prasath, R., Vuppala, A.K., Kathirvalavakumar, T. (eds.) MIKE 2015. LNCS, vol. 9468, pp. 650–655. Springer, Cham (2015). doi:10.1007/978-3-319-26832-3_61 CrossRef Patra, B.G., Das, D., Das, A., Prasath, R.: Shared task on sentiment analysis in Indian languages (SAIL) tweets - an overview. In: Prasath, R., Vuppala, A.K., Kathirvalavakumar, T. (eds.) MIKE 2015. LNCS, vol. 9468, pp. 650–655. Springer, Cham (2015). doi:10.​1007/​978-3-319-26832-3_​61 CrossRef
30.
Zurück zum Zitat Gupta, R., Goyal, P., Diwakar, S.: Transliteration among Indian languages using WX notation. In: KONVENS, pp. 147–150 (2010) Gupta, R., Goyal, P., Diwakar, S.: Transliteration among Indian languages using WX notation. In: KONVENS, pp. 147–150 (2010)
31.
Zurück zum Zitat Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
32.
Zurück zum Zitat Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:​1301.​3781 (2013)
33.
Zurück zum Zitat van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)MATH van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)MATH
34.
Zurück zum Zitat Krishnamurti, B., Gwynn, J.P.L.: A Grammar of Modern Telugu. Oxford University Press, New York (1985) Krishnamurti, B., Gwynn, J.P.L.: A Grammar of Modern Telugu. Oxford University Press, New York (1985)
35.
Zurück zum Zitat Krishnamurthi, B.: Telugu verbal bases: a comparative and descriptive study (1961) Krishnamurthi, B.: Telugu verbal bases: a comparative and descriptive study (1961)
36.
Zurück zum Zitat Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents
37.
Zurück zum Zitat Mahajan, A., Teneketzis, D.: Multi-armed bandit problems. In: Hero, A.O., Castañón, D.A., Cochran, D., Kastella, K. (eds.) Foundations and Applications of Sensor Management, pp. 121–151. Springer, Boston (2008)CrossRef Mahajan, A., Teneketzis, D.: Multi-armed bandit problems. In: Hero, A.O., Castañón, D.A., Cochran, D., Kastella, K. (eds.) Foundations and Applications of Sensor Management, pp. 121–151. Springer, Boston (2008)CrossRef
38.
Zurück zum Zitat Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, pp. 233–240 (2006) Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, pp. 233–240 (2006)
39.
Zurück zum Zitat Seewald, A.K.: Meta-learning for stacked classification. Audiology 24(226), 69 Seewald, A.K.: Meta-learning for stacked classification. Audiology 24(226), 69
Metadaten
Titel
Tag Me a Label with Multi-arm: Active Learning for Telugu Sentiment Analysis
verfasst von
Sandeep Sricharan Mukku
Subba Reddy Oota
Radhika Mamidi
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-64283-3_26