Skip to main content

2019 | OriginalPaper | Buchkapitel

Network-Based Pooling for Topic Modeling on Microblog Content

verfasst von : Anaïs Ollagnier, Hywel Williams

Erschienen in: String Processing and Information Retrieval

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Topic modeling with tweets is difficult due to the short and informal nature of the texts. Tweet-pooling (aggregation of tweets into longer documents prior to training) has been shown to improve model outputs, but performance varies depending on the pooling scheme and data set used. Here we investigate a new tweet-pooling method based on network structures associated with Twitter content. Using a standard formulation of the well-known Latent Dirichlet Allocation (LDA) topic model, we trained various models using different tweet-pooling schemes on three diverse Twitter datasets. Tweet-pooling schemes were created based on mention/reply relationships between tweets and Twitter users, with several (non-networked) established methods also tested as a comparison. Results show that pooling tweets using network information gives better topic coherence and clustering performance than other pooling schemes, on the majority of datasets tested. Our findings contribute to an improved methodology for topic modeling with Twitter content.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
2
In September 2017, Twitter expanded the original 140-character limit to 280 characters. See: https://​blog.​twitter.​com/​official/​en_​us/​topics/​product/​2017/​tweetingmadeeasi​er.​html. Date of access: 11th Feb 2019.
 
Literatur
1.
Zurück zum Zitat Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
2.
Zurück zum Zitat Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 487–494. AUAI Press (2004) Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 487–494. AUAI Press (2004)
3.
Zurück zum Zitat Hong, L., Davison, B.D.: Empirical study of topic modeling in twitter. In: Proceedings of the 1st Workshop on Social Media Analytics, pp. 80–88. ACM (2010) Hong, L., Davison, B.D.: Empirical study of topic modeling in twitter. In: Proceedings of the 1st Workshop on Social Media Analytics, pp. 80–88. ACM (2010)
4.
Zurück zum Zitat Mehrotra, R., Sanner, S., Buntine, W., Xie, L.: Improving lDA topic models for microblogs via tweet pooling and automatic labeling. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 889–892. ACM (2013) Mehrotra, R., Sanner, S., Buntine, W., Xie, L.: Improving lDA topic models for microblogs via tweet pooling and automatic labeling. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 889–892. ACM (2013)
5.
Zurück zum Zitat Alvarez-Melis, D., Saveski, M.: Topic modeling in twitter: aggregating tweets by conversations. In: Proceedings of the 10th International AAAI Conference on Web and Social Media, pp. 519–522 (2016) Alvarez-Melis, D., Saveski, M.: Topic modeling in twitter: aggregating tweets by conversations. In: Proceedings of the 10th International AAAI Conference on Web and Social Media, pp. 519–522 (2016)
6.
Zurück zum Zitat Hajjem, M., Latiri, C.: Combining IR and LDA topic modeling for filtering microblogs. Procedia Comput. Sci. 112, 761–770 (2017) CrossRef Hajjem, M., Latiri, C.: Combining IR and LDA topic modeling for filtering microblogs. Procedia Comput. Sci. 112, 761–770 (2017) CrossRef
7.
Zurück zum Zitat Ahmad, W., Ali, R.: Information retrieval from social networks: a survey. In: Proceedings of the 3rd International Conference on Recent Advances in Information Technology (RAIT), pp. 631–635. IEEE (2016) Ahmad, W., Ali, R.: Information retrieval from social networks: a survey. In: Proceedings of the 3rd International Conference on Recent Advances in Information Technology (RAIT), pp. 631–635. IEEE (2016)
8.
Zurück zum Zitat Manning, C., Raghavan, P., Schütze, H.: Introduction to information retrieval. Natural Lang. Eng. 16(1), 100–103 (2010)CrossRef Manning, C., Raghavan, P., Schütze, H.: Introduction to information retrieval. Natural Lang. Eng. 16(1), 100–103 (2010)CrossRef
9.
Zurück zum Zitat Zhao, W.X., Jiang, J., Weng, J., He, J., Lim, E.-P., Yan, H., Li, X.: Comparing twitter and traditional media using topic models. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 338–349. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20161-5_34CrossRef Zhao, W.X., Jiang, J., Weng, J., He, J., Lim, E.-P., Yan, H., Li, X.: Comparing twitter and traditional media using topic models. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 338–349. Springer, Heidelberg (2011). https://​doi.​org/​10.​1007/​978-3-642-20161-5_​34CrossRef
10.
Zurück zum Zitat Lau, J.H., Newman, D., Baldwin, T.: Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 530–539 (2014) Lau, J.H., Newman, D., Baldwin, T.: Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 530–539 (2014)
Metadaten
Titel
Network-Based Pooling for Topic Modeling on Microblog Content
verfasst von
Anaïs Ollagnier
Hywel Williams
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-32686-9_6

Neuer Inhalt