Skip to main content

2020 | OriginalPaper | Buchkapitel

Short Text Processing for Analyzing User Portraits: A Dynamic Combination

verfasst von : Zhengping Ding, Chen Yan, Chunli Liu, Jianrui Ji, Yezheng Liu

Erschienen in: Artificial Neural Networks and Machine Learning – ICANN 2020

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The rich digital footprint left by users on the Internet has led to extensive researches on all aspects of Internet users. Among them, topic modeling is used to analyze text information posted by users on websites to generate user portraits. For dealing with the serious sparsity problems when extracting topics from short texts by traditional text modeling methods such as Latent Dirichlet Allocation (LDA), researchers usually aggregate all the texts published by each user into a pseudo-document. However, such pseudo-documents contain a lot of irrelevant topics, which is not consistent with the documents published by people in reality. To that end, this paper introduces the LDA-RCC model for dynamic text modeling based on the actual text, which is used to analyze the interests of forum users and build user portraits. Specifically, this combined model can effectively process short texts through the iterative combination of text modeling method LDA and robust continuous clustering method (RCC). Meanwhile, this model can automatically extract the number of topics based on the user’s data. In this way, by processing the clustering results, we can obtain the preferences of each user for deep user analysis. A large number of experimental results show that the LDA-RCC model can obtain good results and is superior to both traditional text modeling methods and short text clustering benchmark methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Blei, D.M., Ng, A.Y., Jordan, M.I., Lafferty, J.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I., Lafferty, J.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
2.
Zurück zum Zitat Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1445–1456 (2013) Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1445–1456 (2013)
3.
Zurück zum Zitat Lim, K.W., Chen, C., Buntine, W.: Twitter-network topic model: a full Bayesian treatment for social network and text modeling. arXiv preprint arXiv:1609.06791 (2016) Lim, K.W., Chen, C., Buntine, W.: Twitter-network topic model: a full Bayesian treatment for social network and text modeling. arXiv preprint arXiv:​1609.​06791 (2016)
4.
Zurück zum Zitat Zuo, Y., et al.: Topic modeling of short texts: a pseudo-document view. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2105–2114 (2016) Zuo, Y., et al.: Topic modeling of short texts: a pseudo-document view. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2105–2114 (2016)
5.
Zurück zum Zitat Liu, J., Toubia, O.: A semantic approach for estimating consumer content preferences from online search queries. Market. Sci. 37, 930–952 (2018)CrossRef Liu, J., Toubia, O.: A semantic approach for estimating consumer content preferences from online search queries. Market. Sci. 37, 930–952 (2018)CrossRef
8.
Zurück zum Zitat Nasraoui, O., Soliman, M., Saka, E., Badia, A., Germain, R.: A web usage mining framework for mining evolving user profiles in dynamic web sites. IEEE Trans. Knowl. Data Eng. 20, 202–215 (2007)CrossRef Nasraoui, O., Soliman, M., Saka, E., Badia, A., Germain, R.: A web usage mining framework for mining evolving user profiles in dynamic web sites. IEEE Trans. Knowl. Data Eng. 20, 202–215 (2007)CrossRef
9.
Zurück zum Zitat Zhou, M.X., Wang, F., Zimmerman, T., Yang, H., Haber, E., Gou, L.: Computational discovery of personal traits from social multimedia. In: IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 1–6. IEEE (2013) Zhou, M.X., Wang, F., Zimmerman, T., Yang, H., Haber, E., Gou, L.: Computational discovery of personal traits from social multimedia. In: IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 1–6. IEEE (2013)
10.
Zurück zum Zitat Sumner, C., Byers, A., Shearing, M.: Determining personality traits and privacy concerns from facebook activity. Black Hat Brief. 11, 197–221 (2011) Sumner, C., Byers, A., Shearing, M.: Determining personality traits and privacy concerns from facebook activity. Black Hat Brief. 11, 197–221 (2011)
11.
Zurück zum Zitat Schwartz, H.A., et al.: Personality, gender, and age in the language of social media: The open-vocabulary approach. PLoS One 8, e73791 (2013)CrossRef Schwartz, H.A., et al.: Personality, gender, and age in the language of social media: The open-vocabulary approach. PLoS One 8, e73791 (2013)CrossRef
12.
Zurück zum Zitat Puranam, D., Narayan, V., Kadiyali, V.: The effect of calorie posting regulation on consumer opinion: a flexible latent Dirichlet allocation model with informative priors. Market. Sci. 36, 726–746 (2017)CrossRef Puranam, D., Narayan, V., Kadiyali, V.: The effect of calorie posting regulation on consumer opinion: a flexible latent Dirichlet allocation model with informative priors. Market. Sci. 36, 726–746 (2017)CrossRef
13.
Zurück zum Zitat Liu, X., Burns, A.C., Hou, Y.: An investigation of brand-related user-generated content on Twitter. J. Advert. 46, 236–247 (2017)CrossRef Liu, X., Burns, A.C., Hou, Y.: An investigation of brand-related user-generated content on Twitter. J. Advert. 46, 236–247 (2017)CrossRef
14.
Zurück zum Zitat Michelson, M., Macskassy, S.A.: Discovering users’ topics of interest on Twitter: a first look. In: Proceedings of the Fourth Workshop on Analytics for Noisy Unstructured Text Data, pp. 73–80 (2010) Michelson, M., Macskassy, S.A.: Discovering users’ topics of interest on Twitter: a first look. In: Proceedings of the Fourth Workshop on Analytics for Noisy Unstructured Text Data, pp. 73–80 (2010)
16.
Zurück zum Zitat Liu, Q., Niu, K., He, Z., He, X.: Microblog user interest modeling based on feature propagation. In: 2013 Sixth International Symposium on Computational Intelligence and Design, pp. 383–386. IEEE (2013) Liu, Q., Niu, K., He, Z., He, X.: Microblog user interest modeling based on feature propagation. In: 2013 Sixth International Symposium on Computational Intelligence and Design, pp. 383–386. IEEE (2013)
17.
Zurück zum Zitat Wang, C., Blei, D.M.: Decoupling sparsity and smoothness in the discrete hierarchical Dirichlet process. In: Advances in Neural Information Processing Systems, pp. 1982–1989. (2009) Wang, C., Blei, D.M.: Decoupling sparsity and smoothness in the discrete hierarchical Dirichlet process. In: Advances in Neural Information Processing Systems, pp. 1982–1989. (2009)
18.
Zurück zum Zitat Weng, J., Lim, E.-P., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential Twitterers. In: Proceedings of the third ACM International Conference on Web Search and Data Mining, pp. 261–270 (2010) Weng, J., Lim, E.-P., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential Twitterers. In: Proceedings of the third ACM International Conference on Web Search and Data Mining, pp. 261–270 (2010)
19.
Zurück zum Zitat Hong, L., Davison, B.D.: Empirical study of topic modeling in Twitter. In: Proceedings of the First Workshop on Social Media Analytics, pp. 80–88 (2010) Hong, L., Davison, B.D.: Empirical study of topic modeling in Twitter. In: Proceedings of the First Workshop on Social Media Analytics, pp. 80–88 (2010)
20.
Zurück zum Zitat Cheng, X., Yan, X., Lan, Y., Guo, J.: Btm: Topic modeling over short texts. IEEE Trans. Knowl. Data Eng. 26, 2928–2941 (2014)CrossRef Cheng, X., Yan, X., Lan, Y., Guo, J.: Btm: Topic modeling over short texts. IEEE Trans. Knowl. Data Eng. 26, 2928–2941 (2014)CrossRef
21.
Zurück zum Zitat Phan, X.-H., Nguyen, L.-M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th International Conference on World Wide Web, pp. 91–100 (2008) Phan, X.-H., Nguyen, L.-M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th International Conference on World Wide Web, pp. 91–100 (2008)
22.
Zurück zum Zitat Yin, J., Wang, J.: A Dirichlet multinomial mixture model-based approach for short text clustering. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 233–242 (2014) Yin, J., Wang, J.: A Dirichlet multinomial mixture model-based approach for short text clustering. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 233–242 (2014)
24.
Zurück zum Zitat Shah, S.A., Koltun, V.: Robust continuous clustering. Proc. Natl. Acad. Sci. 114, 9814–9819 (2017)CrossRef Shah, S.A., Koltun, V.: Robust continuous clustering. Proc. Natl. Acad. Sci. 114, 9814–9819 (2017)CrossRef
25.
Zurück zum Zitat Mimno, D., Wallach, H., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 262–272 (2011) Mimno, D., Wallach, H., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 262–272 (2011)
26.
Zurück zum Zitat Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, Cambridge (2007) Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, Cambridge (2007)
Metadaten
Titel
Short Text Processing for Analyzing User Portraits: A Dynamic Combination
verfasst von
Zhengping Ding
Chen Yan
Chunli Liu
Jianrui Ji
Yezheng Liu
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-61616-8_59

Premium Partner