Skip to main content
Erschienen in: Advances in Data Analysis and Classification 3/2023

13.10.2022 | Regular Article

On smoothing and scaling language model for sentiment based information retrieval

verfasst von: Fatma Najar, Nizar Bouguila

Erschienen in: Advances in Data Analysis and Classification | Ausgabe 3/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Sentiment analysis or opinion mining refers to the discovery of sentiment information within textual documents, tweets, or review posts. This field has emerged with the social media outgrowth which becomes of great interest for several applications such as marketing, tourism, and business. In this work, we approach Twitter sentiment analysis through a novel framework that addresses simultaneously the problems of text representation such as sparseness and high-dimensionality. We propose an information retrieval probabilistic model based on a new distribution namely the Smoothed Scaled Dirichlet distribution. We present a likelihood learning method for estimating the parameters of the distribution and we propose a feature generation from the information retrieval system. We apply the proposed approach Smoothed Scaled Relevance Model on four Twitter sentiment datasets: STD, STS-Gold, SemEval14, and SentiStrength. We evaluate the performance of the offered solution with a comparison against the baseline models and the related-works.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Bouguila N, Ziou D (2004) Improving content based image retrieval systems using finite multinomial dirichlet mixture. In: Proceedings of the 2004 14th IEEE Signal Processing Society Workshop Machine Learning for Signal Processing, 2004., pp. 23–32. IEEE Bouguila N, Ziou D (2004) Improving content based image retrieval systems using finite multinomial dirichlet mixture. In: Proceedings of the 2004 14th IEEE Signal Processing Society Workshop Machine Learning for Signal Processing, 2004., pp. 23–32. IEEE
Zurück zum Zitat Burges C, Shaked T, Renshaw E, Lazier A, Deeds M, Hamilton N, Hullender G (2005) Learning to rank using gradient descent. In: Proceedings of the 22nd international conference on Machine learning, pp. 89–96 Burges C, Shaked T, Renshaw E, Lazier A, Deeds M, Hamilton N, Hullender G (2005) Learning to rank using gradient descent. In: Proceedings of the 22nd international conference on Machine learning, pp. 89–96
Zurück zum Zitat Coletta LF, da Silva NF, Hruschka ER, Hruschka ER (2014) Combining classification and clustering for tweet sentiment analysis. In: 2014 Brazilian conference on intelligent systems, pp. 210–215. IEEE Coletta LF, da Silva NF, Hruschka ER, Hruschka ER (2014) Combining classification and clustering for tweet sentiment analysis. In: 2014 Brazilian conference on intelligent systems, pp. 210–215. IEEE
Zurück zum Zitat Da Silva NF, Hruschka ER, Hruschka ER Jr (2014) Tweet sentiment analysis with classifier ensembles. Decis Support Syst 66:170–179CrossRef Da Silva NF, Hruschka ER, Hruschka ER Jr (2014) Tweet sentiment analysis with classifier ensembles. Decis Support Syst 66:170–179CrossRef
Zurück zum Zitat Davidov D, Tsur O, Rappoport A (2010) Enhanced sentiment learning using twitter hashtags and smileys. In: Coling 2010: Posters, pp. 241–249 Davidov D, Tsur O, Rappoport A (2010) Enhanced sentiment learning using twitter hashtags and smileys. In: Coling 2010: Posters, pp. 241–249
Zurück zum Zitat Fan Y, Guo J, Lan Y, Xu J, Zhai C, Cheng X (2018) Modeling diverse relevance patterns in ad-hoc retrieval. In: The 41st international ACM SIGIR conference on research and development in information retrieval, pp. 375–384 Fan Y, Guo J, Lan Y, Xu J, Zhai C, Cheng X (2018) Modeling diverse relevance patterns in ad-hoc retrieval. In: The 41st international ACM SIGIR conference on research and development in information retrieval, pp. 375–384
Zurück zum Zitat Feng SL, Manmatha R, Lavrenko V (2004) Multiple bernoulli relevance models for image and video annotation. In: Proceedings of the 2004 IEEE Computer society conference on computer vision and pattern recognition, 2004. CVPR 2004., vol. 2. IEEE Feng SL, Manmatha R, Lavrenko V (2004) Multiple bernoulli relevance models for image and video annotation. In: Proceedings of the 2004 IEEE Computer society conference on computer vision and pattern recognition, 2004. CVPR 2004., vol. 2. IEEE
Zurück zum Zitat Fuhr N (2008) A probability ranking principle for interactive information retrieval. Inf Retr. 11(3):251–265CrossRef Fuhr N (2008) A probability ranking principle for interactive information retrieval. Inf Retr. 11(3):251–265CrossRef
Zurück zum Zitat Gao J, Pantel P, Gamon M, He X, Deng L (2014) Modeling interestingness with deep neural networks. In: Conference on empirical methods in natural language processing (EMNLP) Gao J, Pantel P, Gamon M, He X, Deng L (2014) Modeling interestingness with deep neural networks. In: Conference on empirical methods in natural language processing (EMNLP)
Zurück zum Zitat Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. CS224N project report, Stanford 1(12) Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. CS224N project report, Stanford 1(12)
Zurück zum Zitat Guo J, Fan Y, Ai Q, Croft WB (2016) A deep relevance matching model for ad-hoc retrieval. In: Proceedings of the 25th ACM international on conference on information and knowledge management, pp. 55–64 Guo J, Fan Y, Ai Q, Croft WB (2016) A deep relevance matching model for ad-hoc retrieval. In: Proceedings of the 25th ACM international on conference on information and knowledge management, pp. 55–64
Zurück zum Zitat Htait A, Fournier S, Bellot P, Azzopardi L, Pasi G (2020) Using sentiment analysis for pseudo-relevance feedback in social book search. In: Proceedings of the 2020 ACM SIGIR on international conference on theory of information retrieval, pp. 29–32 Htait A, Fournier S, Bellot P, Azzopardi L, Pasi G (2020) Using sentiment analysis for pseudo-relevance feedback in social book search. In: Proceedings of the 2020 ACM SIGIR on international conference on theory of information retrieval, pp. 29–32
Zurück zum Zitat Hu B, Lu Z, Li H, Chen Q (2014) Convolutional neural network architectures for matching natural language sentences. In: Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, K.Q. Weinberger (eds.) Advances in neural information processing systems, vol. 27. Curran Associates, Inc Hu B, Lu Z, Li H, Chen Q (2014) Convolutional neural network architectures for matching natural language sentences. In: Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, K.Q. Weinberger (eds.) Advances in neural information processing systems, vol. 27. Curran Associates, Inc
Zurück zum Zitat Huang PS, He X, Gao J, Deng L, Acero A, Heck L (2013) Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the 22nd ACM international conference on Information and Knowledge Management, pp. 2333–2338 Huang PS, He X, Gao J, Deng L, Acero A, Heck L (2013) Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the 22nd ACM international conference on Information and Knowledge Management, pp. 2333–2338
Zurück zum Zitat Hui K, Yates A, Berberich K, De Melo G (2018) Co-pacrr: A context-aware neural ir model for ad-hoc retrieval. In: Proceedings of the eleventh ACM international conference on web search and data mining, pp. 279–287 Hui K, Yates A, Berberich K, De Melo G (2018) Co-pacrr: A context-aware neural ir model for ad-hoc retrieval. In: Proceedings of the eleventh ACM international conference on web search and data mining, pp. 279–287
Zurück zum Zitat Jianqiang Z, Xiaolin G, Xuejun Z (2018) Deep convolution neural networks for twitter sentiment analysis. IEEE Access 6:23253–23260CrossRef Jianqiang Z, Xiaolin G, Xuejun Z (2018) Deep convolution neural networks for twitter sentiment analysis. IEEE Access 6:23253–23260CrossRef
Zurück zum Zitat Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. In: Proc 52nd Annu Meet Assoc Comput Linguistics, pp. 655–666 Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. In: Proc 52nd Annu Meet Assoc Comput Linguistics, pp. 655–666
Zurück zum Zitat Kauer AU, Moreira VP (2016) Using information retrieval for sentiment polarity prediction. Expert Syst Appl 61:282–289CrossRef Kauer AU, Moreira VP (2016) Using information retrieval for sentiment polarity prediction. Expert Syst Appl 61:282–289CrossRef
Zurück zum Zitat Lavrenko V (2004) A generative theory of relevance. Ph.D. thesis Lavrenko V (2004) A generative theory of relevance. Ph.D. thesis
Zurück zum Zitat Lavrenko V, Croft WB (2017) Relevance-based language models. ACM SIGIR Forum, vol 51. ACM, New York NY, USA, pp 260–267 Lavrenko V, Croft WB (2017) Relevance-based language models. ACM SIGIR Forum, vol 51. ACM, New York NY, USA, pp 260–267
Zurück zum Zitat Metzler D, Lavrenko V, Croft WB (2004) Formal multiple-bernoulli models for language modeling. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 540–541 Metzler D, Lavrenko V, Croft WB (2004) Formal multiple-bernoulli models for language modeling. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 540–541
Zurück zum Zitat Monti GS, Mateu-Figueras G, Pawlowsky-Glahn V (2011) Notes on the scaled dirichlet distribution. Compositional data analysis, pp. 128–138 Monti GS, Mateu-Figueras G, Pawlowsky-Glahn V (2011) Notes on the scaled dirichlet distribution. Compositional data analysis, pp. 128–138
Zurück zum Zitat Nallapati R (2006) The smoothed dirichlet distribution: Understanding cross-entropy ranking in information retrieval. Ph.D. thesis, University of Massachusetts Amherst Nallapati R (2006) The smoothed dirichlet distribution: Understanding cross-entropy ranking in information retrieval. Ph.D. thesis, University of Massachusetts Amherst
Zurück zum Zitat Oboh BS, Bouguila N (2017) Unsupervised learning of finite mixtures using scaled dirichlet distribution and its application to software modules categorization. In: 2017 IEEE International Conference on Industrial Technology (ICIT), pp. 1085–1090 Oboh BS, Bouguila N (2017) Unsupervised learning of finite mixtures using scaled dirichlet distribution and its application to software modules categorization. In: 2017 IEEE International Conference on Industrial Technology (ICIT), pp. 1085–1090
Zurück zum Zitat Pang B, Lee L (2004) A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42th annual meeting of the association of computational linguistics (ACL), pp. 271–278 Pang B, Lee L (2004) A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42th annual meeting of the association of computational linguistics (ACL), pp. 271–278
Zurück zum Zitat Petrucci G, Dragoni M (2015) An information retrieval-based system for multi-domain sentiment analysis. Semantic web evaluation challenges. Springer, Cham, pp 234–243CrossRef Petrucci G, Dragoni M (2015) An information retrieval-based system for multi-domain sentiment analysis. Semantic web evaluation challenges. Springer, Cham, pp 234–243CrossRef
Zurück zum Zitat Qin T, Liu TY, Xu J, Li H (2010) LETOR: a benchmark collection for research on learning to rank for information retrieval. Inf Retrieval 13(4):346–374CrossRef Qin T, Liu TY, Xu J, Li H (2010) LETOR: a benchmark collection for research on learning to rank for information retrieval. Inf Retrieval 13(4):346–374CrossRef
Zurück zum Zitat Rath TM, Lavrenko V, Manmatha R (2003) A statistical approach to retrieving historical manuscript images without recognition. Tech. rep, Space and Naval Warfare Systems Center San Diego CA Rath TM, Lavrenko V, Manmatha R (2003) A statistical approach to retrieving historical manuscript images without recognition. Tech. rep, Space and Naval Warfare Systems Center San Diego CA
Zurück zum Zitat Rendle S, Freudenthaler C, Gantner Z, Schmidt-Thieme L (2012) BPR: Bayesian personalized ranking from implicit feedback. arXiv preprint arXiv:1205.2618 Rendle S, Freudenthaler C, Gantner Z, Schmidt-Thieme L (2012) BPR: Bayesian personalized ranking from implicit feedback. arXiv preprint arXiv:​1205.​2618
Zurück zum Zitat Robertson SE (1977) The probability ranking principle in IR. Journal of documentation Robertson SE (1977) The probability ranking principle in IR. Journal of documentation
Zurück zum Zitat Rosenthal S, Ritter A, Nakov P, Stoyanov V (2014) SemEval-2014 task 9: Sentiment analysis in Twitter. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 73–80. Association for Computational Linguistics, Dublin, Ireland Rosenthal S, Ritter A, Nakov P, Stoyanov V (2014) SemEval-2014 task 9: Sentiment analysis in Twitter. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 73–80. Association for Computational Linguistics, Dublin, Ireland
Zurück zum Zitat Salakhutdinov R, Hinton G (2009) Semantic hashing. Int J Approx Reason 50(7):969–978CrossRef Salakhutdinov R, Hinton G (2009) Semantic hashing. Int J Approx Reason 50(7):969–978CrossRef
Zurück zum Zitat Shen Y, He X, Gao J, Deng L, Mesnil G (2014) Learning semantic representations using convolutional neural networks for web search. In: Proceedings of the 23rd international conference on world wide web, pp. 373–374 Shen Y, He X, Gao J, Deng L, Mesnil G (2014) Learning semantic representations using convolutional neural networks for web search. In: Proceedings of the 23rd international conference on world wide web, pp. 373–374
Zurück zum Zitat Thelwall M, Buckley K, Paltoglou G (2012) Sentiment strength detection for the social web. J Am Soc Inform Sci Technol 63(1):163–173CrossRef Thelwall M, Buckley K, Paltoglou G (2012) Sentiment strength detection for the social web. J Am Soc Inform Sci Technol 63(1):163–173CrossRef
Zurück zum Zitat Vosoughi S, Zhou H, Roy D (2016) Enhanced twitter sentiment classification using contextual information. Proceedings of the 6th workshop on computational approaches to subjectivity, sentiment and social media analysis Vosoughi S, Zhou H, Roy D (2016) Enhanced twitter sentiment classification using contextual information. Proceedings of the 6th workshop on computational approaches to subjectivity, sentiment and social media analysis
Zurück zum Zitat Wang J, Yu L, Zhang W, Gong Y, Xu Y, Wang B, Zhang P, Zhang D (2017) Irgan: A minimax game for unifying generative and discriminative information retrieval models. In: Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval, pp. 515–524 Wang J, Yu L, Zhang W, Gong Y, Xu Y, Wang B, Zhang P, Zhang D (2017) Irgan: A minimax game for unifying generative and discriminative information retrieval models. In: Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval, pp. 515–524
Zurück zum Zitat Wei X, Croft WB (2006) Lda-based document models for ad-hoc retrieval. In: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 178–185 Wei X, Croft WB (2006) Lda-based document models for ad-hoc retrieval. In: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 178–185
Zurück zum Zitat Yi X, Allan J (2009) A comparative study of utilizing topic models for information retrieval. European conference on information retrieval. Springer, Berlin, pp 29–41 Yi X, Allan J (2009) A comparative study of utilizing topic models for information retrieval. European conference on information retrieval. Springer, Berlin, pp 29–41
Zurück zum Zitat Zamzami N, Alsuroji R, Eromonsele O, Bouguila N (2020) Proportional data modeling via selection and estimation of a finite mixture of scaled Dirichlet distributions. Comput Intell 36(2):459–485MathSciNetCrossRef Zamzami N, Alsuroji R, Eromonsele O, Bouguila N (2020) Proportional data modeling via selection and estimation of a finite mixture of scaled Dirichlet distributions. Comput Intell 36(2):459–485MathSciNetCrossRef
Zurück zum Zitat Zamzami N, Bouguila N (2019) A novel scaled dirichlet-based statistical framework for count data modeling: Unsupervised learning and exponential approximation. Pattern Recogn 95:36–47CrossRef Zamzami N, Bouguila N (2019) A novel scaled dirichlet-based statistical framework for count data modeling: Unsupervised learning and exponential approximation. Pattern Recogn 95:36–47CrossRef
Zurück zum Zitat Zhai C, Lafferty J (2017) A study of smoothing methods for language models applied to ad hoc information retrieval. In: ACM SIGIR Forum, vol. 51, pp. 268–276. ACM New York, NY, USA Zhai C, Lafferty J (2017) A study of smoothing methods for language models applied to ad hoc information retrieval. In: ACM SIGIR Forum, vol. 51, pp. 268–276. ACM New York, NY, USA
Zurück zum Zitat Zhang Y, Zhang J, Cui Z, Wu S, Wang L (2021) A graph-based relevance matching model for ad-hoc retrieval. arXiv preprint arXiv:2101.11873 Zhang Y, Zhang J, Cui Z, Wu S, Wang L (2021) A graph-based relevance matching model for ad-hoc retrieval. arXiv preprint arXiv:​2101.​11873
Metadaten
Titel
On smoothing and scaling language model for sentiment based information retrieval
verfasst von
Fatma Najar
Nizar Bouguila
Publikationsdatum
13.10.2022
Verlag
Springer Berlin Heidelberg
Erschienen in
Advances in Data Analysis and Classification / Ausgabe 3/2023
Print ISSN: 1862-5347
Elektronische ISSN: 1862-5355
DOI
https://doi.org/10.1007/s11634-022-00522-6

Weitere Artikel der Ausgabe 3/2023

Advances in Data Analysis and Classification 3/2023 Zur Ausgabe

Premium Partner