nach oben

Automatic Documentation and Mathematical Linguistics

Erschienen in:

01.02.2024 | NATURAL LANGUAGE PROCESSING

Topic Modeling for Mining Opinion Aspects from a Customer Feedback Corpus

verfasst von: O. I. Babina

Erschienen in: Automatic Documentation and Mathematical Linguistics | Ausgabe 1/2024

Einloggen, um Zugang zu erhalten

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The paper introduces a methodology for extracting opinion aspects from textual content by identifying the customer-evaluated parameters regarding a given object. These parameters form the foundation for shaping the customer’s attitudes toward the product or service. The proposed approach leverages topic modeling tools to delineate classes of vocabulary exhibiting semantics aligned with the parameters influencing the customer’s opinion about the object. Our study specifically explores the application of the BERTopic model as a topic modeling tool to address this challenge. The outlined methodology encompasses several sequential steps, including the preprocessing of textual data involving the removal of stopwords, conversion to lowercase characters, and lemmatization. Additionally, special consideration is given to the distinct lexical manifestations of opinion aspects, obtained as a result of the extraction of nominal, verbal, and adjectival single- and multicomponent phrases from the corpus. Subsequently, the corpus sentences are represented as vectors in a feature space expressed by the extracted words and phrases. The final step involves the application of topic modeling using the BERTopic model on the customer review corpus, utilizing the vector representations of corpus sentences. The experimental inquiry is conducted on a domain-specific Russian-language corpus comprising customer feedback on airline services gathered from customer review websites. The resultant topic distribution is then juxtaposed against a manually constructed conceptual model of the domain. The comparative analysis reveals that the automatic topic distribution aligns with the conceptual structure of the domain, demonstrating a precision of 0.955 and a recall of 0.875. These findings affirm the efficacy of employing the BERTopic model to address the problem of the corpus-based mining of opinion aspects.

Vorheriger Artikel Digital Twins and Basic Digital Student Profiles

Nächster Artikel Technical Writer as a Communication Specialty: Editorial Interpretation of Its Professionogram

Bollen, J., Mao, H., and Zeng, X., Twitter mood predicts the stock market, J. Comput. Sci., 2011, vol. 2, no. 1, pp. 1–8. https://doi.org/10.1016/j.jocs.2010.12.007CrossRef

Molina-González, M.D., Martínez-Cámara, E., Martín-Valdivia, M.-T., and Perea-Ortega, J.M., Semantic orientation for polarity classification in Spanish reviews, Expert Syst. Appl., 2013, vol. 40, pp. 7250–7257. https://doi.org/10.1016/j.eswa.2013.06.076CrossRef

Kiritchenko, S., Zhu, X., and Mohammad, S., Sentiment analysis of short informal texts, J. Artif. Intell. Res., 2014, vol. 50, pp. 723–762. https://doi.org/10.1613/jair.4272CrossRef

Altawaier, M.M. and Tiun, S., Comparison of Machine Learning Approaches on Arabic Twitter Sentiment Analysis, Int. J. Adv. Sci., Eng. Inf. Technol., 2016, vol. 6, no. 6, pp. 1067–1073. https://doi.org/10.18517/IJASEIT.6.6.1456CrossRef

Kolmogorova, A.V., Use of texts of the internet revelation genre in the context of solving the problems of sentiment-analysis, Vestn. Novosibirskogo Gos. Univ. Ser.: Lingvistika Mezhkul’turnaya Kommunikatsiya, 2019, no. 3, pp. 71–82. https://doi.org/10.25205/1818-7935-2019-17-3-71-82

Mohammad, S.M., Sentiment analysis: Automatically detecting valence, emotions, and other affectual states from text, Emotion Measurement, Meiselman, H.L., Ed., Woodhead Publishing, 2021, pp. 323–379. https://doi.org/10.1016/B978-0-12-821124-3.00011-9CrossRef

Semina, T.A., Sentiment analysis: Modern approaches and existing problems, Sotsial’nye Gumanitarnye Nauki. Otechestvennaya Zarubezhnaya Literatura. Ser. 6: Yazykoznanie. Referativnyi Zh., 2020, no. 4, pp. 47–63.

Fang, X. and Zhan, J., Sentiment analysis using product review data, J. Big Data, 2015, vol. 2, p. 5. https://doi.org/10.1186/s40537-015-0015-2CrossRef

Chitra, K., Tamilarasi, A., Dharani, S.G., Keerthana, P., and Madhumitha, T., Opinion mining and sentiment analysis on product reviews, 2022 Int. Conf. on Computer Communication and Informatics (ICCCI), Coimbatore, India, 2022, IEEE, 2022, pp. 1–7. https://doi.org/10.1109/ICCCI54379.2022.9740777

10.

Geetha, R., Rekha, P., and Karthika, S., Twitter opinion mining and boosting using sentiment analysis, Proc. 2018 Int. Conf. on Computer, Communication, and Signal Processing (ICCCSP), Chennai, India, 2018, IEEE, 2018, pp. 1–4. https://doi.org/10.1109/ICCCSP.2018.8452838

11.

Liu, Y., Yu, X., Liu, B., and Chen, Z., Sentence-Level sentiment analysis in the presence of modalities, Computational Linguistics and Intelligent Text Processing, Gelbukh, A., Ed., Lecture Notes in Computer Science, vol. 8404, Berlin: Springer, 2014, pp. 1–16. https://doi.org/10.1007/978-3-642-54903-8_1CrossRef

12.

Paniagua-Reyes, F., Reyes-Ortiz, J., and Bravo, M., Entity-based opinion mining from Spanish tweets, Proc. 6th Int. Conf. on Data Science, Technology and Applications, Madrid: SciTePress, 2017, pp. 400–407. https://doi.org/10.5220/0006484904000407

13.

Lark, J., Morin, E., and Saldarriaga, S.P., A comparative study of target-based and entity-based opinion extraction, Computational Linguistics and Intelligent Text Processing. CICLing 2017, Gelbukh, A., Ed., Lecture Notes in Computer Science, vol. 10762, Cham: Springer, 2017, pp. 211–223. https://doi.org/10.1007/978-3-319-77116-8_16CrossRef

14.

Xu, R., Lin, H., Liao, M., Han, X., Xu, J., Tan, W., Sun, Y., and Sun, L., ECO v1: Towards event-centric opinion mining, findings of the, Findings of the Association for Computational Linguistics: ACL 2022, Dublin, 2022, Muresan, S., Nakov, P., and Villvicencio, A., Eds., Association for Computational Linguistics, 2022, pp. 2743–2753. https://doi.org/10.18653/v1/2022.findings-acl.216

15.

Salas-Zárate, M.P., Valencia-García, R., Ruiz-Martínez, A., and Colomo-Palacios, R., Feature-based opinion mining in financial news: an ontology-driven approach, J. Inf. Sci., 2017, vol. 43, pp. 458–479. https://doi.org/10.1177/0165551516645528CrossRef

16.

Aboelela, E.M., Gad, W., and Isamail, R., The impact of semantics on aspect level opinion mining, PeerJ Comput. Sci., 2021, vol. 7, p. e558. https://doi.org/10.7717/peerj-cs.558CrossRef

17.

Sanda, R., Abdurahman, Z.K., and Nhita, F., Opinion mining feature level using naïve bayes and feature extraction based analysis dependencies, AIP Conf. Proc., 2015, vol. 1692, no. 1, p. 20020. https://doi.org/10.1063/1.4936448CrossRef

18.

Abbasi, A., Chen, H., and Salem, A., Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums, ACM Trans. Inf. Syst. (TOIS), 2008, vol. 26, no. 3, p. 12. https://doi.org/10.1145/1361684.1361685CrossRef

19.

Arora, P., Bakliwal, A., and Varma, V., Hindi subjective lexicon generation using WordNet graph traversal, Int. J. Comput. Linguist. Appl., 2012, vol. 3, no. 1, pp. 25–39.

20.

Hutto, C. and Gilbert, E., VADER: A parsimonious rule-based model for sentiment analysis of social media text, Proc. Int. AAAI Conf. Web Soc. Media, 2014, vol. 8, no. 1, pp. 216–225. https://doi.org/10.1609/icwsm.v8i1.14550

21.

Loukachevitch, N. and Levchik, A., Creating a general Russian sentiment lexicon, Proc. Tenth Int. Conf. on Language Resources and Evaluation (LREC’16), Portorož, Slovenia, 2016, Calzolari, N. et al., Eds., European Language Resources Association, 2016, pp. 1171–1176. https://aclanthology.org/L16-1186.

22.

Koltsova, O., Alexeeva, S., and Kolcov, S., An opinion word lexicon and a training dataset for Russian sentiment analysis of social media, Komp’yuternaya lingvistika i intellektual’nye tekhnologii: po materialam ezhegodnoi mezhdunarodnoi konferentsii Dialog-2016 (Computational Linguistics and Intellectual Technologies: Proc. Int. Conf. Dialogue 2016), Moscow, 2016, Moscow: Izd-vo Ros. Gos. Gumanit. Univ., 2016, pp. 277–287.

23.

Kan, D., Rule-based approach to sentiment analysis at ROMIP 2011: Contest on sentiment analysis at the International Conference Dialogue-2011, 2012. https:// www.dialog-21.ru/media/1393/138.pdf.

24.

Tan, L.I., Phang, W.S., Chin, K.O., and Patricia, A., Rule-based sentiment analysis for financial news, IEEE Int. Conf. on Systems, Man, and Cybernetics, Hong Kong, 2015, IEEE, 2015, pp. 1601–1606. https://doi.org/10.1109/SMC.2015.283

25.

Berka, P., Sentiment analysis using rule-based and case-based reasoning, J. Intell. Inf. Syst., 2020, vol. 55, pp. 51–66. https://doi.org/10.1007/s10844-019-00591-8CrossRef

26.

Taboada, M., Brooke, J., Tofiloski, M., Voll, K., and Stede, M., Lexicon-based methods for sentiment analysis, Comput. Linguist., 2011, vol. 37, no. 2, pp. 267–307. https://doi.org/10.1162/COLI_a_00049CrossRef

27.

Agarwal, A., Xie, B., Vovsha, I., Rambow, O., and Passonneau, R., Sentiment analysis of twitter data, Proc. Workshop on Language in Social Media (LSM 2011), Portland, Ore., 2011, Nagarajan, M. and Gamon, M., Eds., Association for Computational Linguistics, 2011, pp. 30–38. https://aclanthology.org/W11-0705.

28.

Turney, P.D., Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews, Proc. 40th Annu. Meeting on Association for Computational Linguistics, Philadelphia, 2002, Isabelle, P., Charniak, E., and Lin, D., Eds., Association for Computational Linguistics, 2002, pp. 417–424. https://doi.org/10.3115/1073083.1073153

29.

Zhang, L. and Liu, B., Aspect and entity extraction for opinion mining, data mining and knowledge discovery for big data, Data Mining and Knowledge Discovery for Big Data. Studies in Big Data, Chu, W.W., Ed., Studies in Big Data, vol. 1, Berlin: Springer, 2014, pp. 1–40. https://doi.org/10.1007/978-3-642-40837-3_1

30.

Roi, D.A. and Efremova, N.E., Methods for extracting aspectual terms from opinions, Nov. Inf. Tekhnol. Avtomatizirovannykh Sistemakh, 2018, no. 21, pp. 212–216.

31.

Golubev, A. and Loukachevitch, N., Improving results on Russian sentiment datasets, Artificial Intelligence and Natural Language, Filchenkov, A., Kauttonen, J., and Pivovarova, L., Eds., Communications in Computer and Information Science, Cham: Springer, 2020, pp. 109–121. https://doi.org/10.1007/978-3-030-59082-6_8

32.

Pathan, A.F. and Prakash, C., Cross-domain aspect detection and categorization using machine learning for aspect-based opinion mining, Int. J. Inf. Manage. Data Insights, 2022, vol. 2, no. 2, p. 100099. https://doi.org/10.1016/j.jjimei.2022.100099CrossRef

33.

Rajapaksha, S. and Ranathunga, S., Aspect detection in sportswear apparel reviews for opinion mining, Moratuwa Engineering Research Conference (MERCon), Moratuwa, Sri Lanka, 2022, IEEE, 2022, pp. 1–6. https://doi.org/10.1109/MERCon55799.2022.9906265

34.

Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T., and Harshman, R., Indexing by latent semantic analysis, J. Am. Soc. Inf. Sci., 1990, vol. 41, no. 6, pp. 391–407. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6%3C391::AID-ASI1%3E3.0.CO;2-9CrossRef

35.

Hofmann, T., Unsupervised learning by probabilistic latent semantic analysis, Mach. Learn., 2001, vol. 42, nos. 1–2, pp. 177–196. https://doi.org/10.1023/A:1007617005950CrossRef

36.

Blei, D.M., Ng, A.Y., and Jordan, M.I., Latent Dirichlet allocation, J. Mach. Learn. Res., 2003, vol. 3, no. 2, pp. 993–1022.

37.

Wang, J. and Zhang, X.-L., Deep NMF topic modeling, Neurocomputing, 2023, vol. 515, pp. 157–173. https://doi.org/10.1016/j.neucom.2022.10.002CrossRef

38.

Vendrow, J., Haddock, J., Rebrova, E., and Needell, D., On a guided nonnegative matrix factorization, ICASSP 2021-2021 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Toronto, 2021, pp. 3265–3269. https://doi.org/10.1109/ICASSP39728.2021.9413656

39.

Chen, Yo., Zhang, H., Liu, R., Ye, Z., and Lin, J., Experimental explorations on short text topic mining between LDA and NMF based Schemes, Knowl.-Based Syst., 2019, vol. 163, pp. 1–13. https://doi.org/10.1016/j.knosys.2018.08.011CrossRef

40.

Gallagher, R.J., Reing, K., Kale, D., and Ver Steeg, G., Anchored correlation explanation: Topic modeling with minimal domain knowledge, Trans. Assoc. Comput. Linguist., 2017, vol. 5, pp. 529–542. https://doi.org/10.1162/tacl_a_00078CrossRef

41.

Watanabe, S., Information theoretical analysis of multivariate correlation, IBM J. Res. Dev., 1960, vol. 4, no. 1, pp. 66–82. https://doi.org/10.1147/rd.41.0066MathSciNetCrossRef

42.

Moody, C.E., Mixing Dirichlet topic models and word embeddings to make lda2Vec, arXiv Preprint, 2016. https://doi.org/10.48550/arXiv.1605.02019

43.

Angelov, D., Top2Vec: Distributed representations of topics, arXiv Preprint, 2020. https://doi.org/10.48550/arXiv.2008.09470

44.

Dieng, A.B., Ruiz, F.J.R., and Blei, D.M., Topic modeling in embedding spaces, Trans. Assoc. Comput. Linguist., 2020, vol. 8, pp. 439–453. https://doi.org/10.1162/tacl_a_00325CrossRef

45.

Grootendorst, M., BERTopic: Neural topic modeling with a class-based TF-IDF procedure, arXiv Preprint, 2022. https://doi.org/10.48550/arXiv.2203.05794

46.

Albalawi, R., Yeap, T.H., and Benyoucef, M., Using topic modeling methods for short-text data: A comparative analysis, Front. Artif. Intell., 2020, vol. 3, p. 42. https://doi.org/10.3389/frai.2020.00042CrossRef

47.

Egger, R. and Yu, J., A topic modeling comparison between LDA, NMF, Top2Vec, and BERTopic to demystify Twitter posts, Front. Sociology, 2022, vol. 7, p. 886498. https://doi.org/10.3389/fsoc.2022.886498CrossRef

48.

Guo, Y., Barnes, S.J., and Jia, Q., Mining meaning from online ratings and reviews: tourist satisfaction analysis using latent dirichlet allocation, Tourism Manage., 2017, vol. 59, pp. 467–483. https://doi.org/10.1016/j.tourman.2016.09.009CrossRef

49.

Reimers, N. and Gurevych, I., Sentence-BERT: Sentence embeddings using Siamese BERT-networks, Proc. 2019 Conf. on Empirical Methods in Natural Language Processing, Hong Kong, 2019, Inui, K., Jiang, J., Ng, V., and Wan, X., Eds., Association for Computational Linguistics, 2019, pp. 3982–3992. https://doi.org/10.18653/v1/D19-1410

50.

Mitrofanova, O.A. and Atugodage, M.M., Dynamic topic modelling of the Russian legal text corpus, Terra Linguistica, 2023, vol. 14, no. 1, pp. 70–87. https://doi.org/10.18721/JHSS.14107CrossRef

51.

Çetinkaya, Y.M., Külah, E., Hakki Toroslu, I., and Davulcu, H., Targeted marketing on social media: Utilizing text analysis to create personalized landing pages, Preprint at Res. Square, 2023. https://doi.org/10.21203/rs.3.rs-2728199/v1CrossRef

52.

Sharifian-Attar, V., De, S., Jabbari, S., Li, J., Moss, H., and Johnson, J., Analysing longitudinal social science questionnaires: Topic modelling with BERT-based embeddings, 2022 IEEE Int. Conf. on Big Data (Big Data 2022), Osaka, Japan, 2022, IEEE, 2022, pp. 5558–5567. https://doi.org/10.1109/BigData55660.2022.10020678

53.

Alhaj, F., Al-Haj, A., Sharieh, A., and Jabri, R., Improving Arabic cognitive distortion classification in Twitter using BERTopic, Int. J. Adv. Comput. Sci. Appl., 2022, vol. 13, no. 1, pp. 854–860. https://doi.org/10.14569/IJACSA.2022.0130199CrossRef

54.

Gerasimenko, N., Chernyavskiy, A., Nikiforova, M., Ianina, A., and Vorontsov, K., Incremental topic modeling for scientific trend topics extraction, Komp’yuternaya lingvistika i intellektual’nye tekhnologii: Po materialam ezhegodnoi mezhdunarodnoi konferentsii Dialog-2023 (Computational Linguistics and Intellectual Technologies: Proc. Int. Conf. Dialogue 2023), Moscow, 2023, Moscow: 2023, pp. 88–103. https://www. dialog-21.ru/media/5893/gerasimenkonplusetal012.pdf.

55.

Udupa, A., Adarsh, K.N., Aravinda, A., Godihal, N.H., and Kayarvizhy, N., An exploratory analysis of GSDMM and BERTopic on short text topic modelling, Fourth Int. Conf. on Cognitive Computing and Information Processing (CCIP-2022), Bengaluru, India, 2022, IEEE, 2022, pp. 1–9. https://doi.org/10.1109/CCIP57447.2022.10058687

56.

Sheremet’eva, S.O. and Babina, O.I., A platform for knowledge assisted conceptual annotation of multilingual texts, Vestn. Yuzhno-Ural. Gos. Univ. Ser.: Lingvistika, 2020, vol. 17, no. 4, pp. 53–60. https://doi.org/10.14529/ling200409CrossRef

57.

Hu, M. and Liu, B., Mining opinion features in customer reviews, Proc. 19th Natl. Conf. on Artificial Intelligence, San Jose, Calif., 2004, Cohn, A.G., Ed., AAAI Press, 2004, pp. 755–760.

58.

Yi, J., Nasukawa, T., Bunescu, R., and Niblack, W., Sentiment analyzer: Extracting sentiments about a given topic using natural language processing techniques, Proc. IEEE Int. Conf. on Data Mining (ICDM), Melbourne, Fla., IEEE, 2003, pp. 427–434. https://doi.org/10.1109/ICDM.2003.1250949

59.

Sheremetyeva, S.O., Extraction of multicomponent terms and keywords from multilingual patent documentation, Nauchn.-Tekhn. Inform., Ser. 2. Protsessy Sist., 2019, no. 4, pp. 25–33.

60.

Korobov, M., Morphological analyzer and generator for Russian and Ukrainian languages, Analysis of Images, Social Networks and Texts, Khachay, M., Konstantinova, N., Panchenko, A., Ignatov, D., and Labunets, V., Eds., Communications in Computer and Information Science, vol. 542, Cham: Springer, 2015, pp. 320–332. https://doi.org/10.1007/978-3-319-26123-2_31

61.

Sánchez-Franco, M.J. and Rey-Moreno, M., Do travelers’ reviews depend on the destination? An analysis in coastal and urban peer-to-peer lodgings, Psychol. Marketing, 2022, vol. 39, no. 2, pp. 441–459. https://doi.org/10.1002/mar.21608CrossRef

Titel: Topic Modeling for Mining Opinion Aspects from a Customer Feedback Corpus
verfasst von: O. I. Babina
Publikationsdatum: 01.02.2024
Verlag: Pleiades Publishing
Erschienen in: Automatic Documentation and Mathematical Linguistics / Ausgabe 1/2024
Print ISSN: 0005-1055
Elektronische ISSN: 1934-8371
DOI: https://doi.org/10.3103/S0005105524010060

Springer Professional

Topic Modeling for Mining Opinion Aspects from a Customer Feedback Corpus

Abstract

Premium Partner

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Weitere Artikel der Ausgabe 1/2024

Digital Twins and Basic Digital Student Profiles

Technical Writer as a Communication Specialty: Editorial Interpretation of Its Professionogram

The Practical Significance of Bradford’s Informetric Model to Predict the Scattering of Articles and Optimize Journal Selection

On Rank r Empirical Regularities in the JSM Method of Automated Research Supporta

Development of a Modification of the Particle Collision Algorithm (PCA), Providing an Approximate Solution to the Traveling Salesman Problem

Correspondence between Hierarchical Knowledge Classifiers

Premium Partner