Skip to main content

2024 | OriginalPaper | Buchkapitel

Toward Improved Clustering for Textual Data

verfasst von : Ridwan Amure, Abiola Akinnubi, Oyindamola Koleoso

Erschienen in: Proceedings of Third International Conference on Computing and Communication Networks

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This study explores the possibilities of combining manifold learning with contextual embedding from Transformer models for textual cluster analysis. We leverage contextual embeddings to provide a more accurate text representation for text clustering analysis and pass the embedding through a manifold learning algorithm. The results of the experiment show that manifold learning can accentuate the contextual embedding which improves the performance of the clustering algorithms in the characterization and modeling of text data. We used the resulting clusters to distinguish between relevant texts in social media campaigns and showed that the resulting embedding provides a better representation for clustering analysis.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Xie, J., Girshick, R., Farhadi. A.: Unsupervised deep embedding for clustering analysis. In: International Conference on Machine Learning, pp. 478–487. PMLR (2016) Xie, J., Girshick, R., Farhadi. A.: Unsupervised deep embedding for clustering analysis. In: International Conference on Machine Learning, pp. 478–487. PMLR (2016)
2.
Zurück zum Zitat Aggarwal, C.C., Zhai, C.X.: A survey of text clustering algorithms. In: Mining Text Data, pp. 77–128 (2012) Aggarwal, C.C., Zhai, C.X.: A survey of text clustering algorithms. In: Mining Text Data, pp. 77–128 (2012)
3.
Zurück zum Zitat Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014) Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
4.
Zurück zum Zitat Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017) Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
6.
Zurück zum Zitat Muennighoff, N., et al.: MTEB: massive text embedding benchmark. In: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pp. 2014–2037. Association for Computational Linguistics, Dubrovnik, Croatia (2023) Muennighoff, N., et al.: MTEB: massive text embedding benchmark. In: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pp. 2014–2037. Association for Computational Linguistics, Dubrovnik, Croatia (2023)
7.
Zurück zum Zitat Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster evaluation measure. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 410–420 (2007) Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster evaluation measure. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 410–420 (2007)
8.
Zurück zum Zitat Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Conference on Empirical Methods in Natural Language Processing (2019) Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Conference on Empirical Methods in Natural Language Processing (2019)
9.
Zurück zum Zitat Wang, W., et al.: Minilm: deep self-attention distillation for task-agnostic compression of pre-trained transformers. Adv. Neural Inform. Process. Syst. 33, 5776–5788 (2020) Wang, W., et al.: Minilm: deep self-attention distillation for task-agnostic compression of pre-trained transformers. Adv. Neural Inform. Process. Syst. 33, 5776–5788 (2020)
10.
11.
Zurück zum Zitat Devlin, J., et al.: Bert: pre-training of deep bidirectional transformers for language understanding (2018). arXiv preprint arXiv:1810.04805 Devlin, J., et al.: Bert: pre-training of deep bidirectional transformers for language understanding (2018). arXiv preprint arXiv:​1810.​04805
12.
Zurück zum Zitat McInnes, L., et al.: UMAP: uniform manifold approximation and projection. J. Open Sour. Softw. 3(29), 861 (2018) McInnes, L., et al.: UMAP: uniform manifold approximation and projection. J. Open Sour. Softw. 3(29), 861 (2018)
13.
Zurück zum Zitat Sculley, D.: Web-scale k-means clustering. In: Proceedings of the 19th International Conference on World Wide Web. WWW’10, pp. 1177–1178. Association for Computing Machinery, Raleigh, North Carolina, USA (2010). ISBN 9781605587998 Sculley, D.: Web-scale k-means clustering. In: Proceedings of the 19th International Conference on World Wide Web. WWW’10, pp. 1177–1178. Association for Computing Machinery, Raleigh, North Carolina, USA (2010). ISBN 9781605587998
14.
Zurück zum Zitat Uwalaka, T.: Social media as solidarity vehicle during the 2020 #EndSARS protests in Nigeria. J. Asian Afric. Stud. 59(2), 00219096221108737 (2022) Uwalaka, T.: Social media as solidarity vehicle during the 2020 #EndSARS protests in Nigeria. J. Asian Afric. Stud. 59(2), 00219096221108737 (2022)
Metadaten
Titel
Toward Improved Clustering for Textual Data
verfasst von
Ridwan Amure
Abiola Akinnubi
Oyindamola Koleoso
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-0892-5_32