Skip to main content

2020 | OriginalPaper | Buchkapitel

Self-Organizing Map for Multi-view Text Clustering

verfasst von : Maha Fraj, Mohamed Aymen Ben Hajkacem, Nadia Essoussi

Erschienen in: Big Data Analytics and Knowledge Discovery

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Text document clustering represents a key task in machine learning, which partitions a specific documents’ collection into clusters of related documents. To this end, a pre-processing step is carried to represent text in a structured form. However, text depicts several aspects, which a single representation cannot capture. Therefore, multi-view clustering present an efficient solution to exploit and integrate the information captured from different representations or views. However, the existing methods are limited to represent views using terms frequencies based representations which lead to losing valuable information and fails to capture the semantic aspect of text. To deal with these issues, we propose a new method for multi-view text clustering that exploits different representations of text. The proposed method explores the use of Self-Organizing Map to the problem of unsupervised clustering of texts by taking into account simultaneously several views, that are obtained from textual data. Experiments are performed to demonstrate the improvement of clustering results compared to the existing methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
3.
Zurück zum Zitat Bickel, S., Scheffer, T.: Multi-view clustering. In: ICDM, vol. 4, pp. 19–26 (2004) Bickel, S., Scheffer, T.: Multi-view clustering. In: ICDM, vol. 4, pp. 19–26 (2004)
4.
Zurück zum Zitat Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
5.
Zurück zum Zitat Bolstad, W.M.: Understanding Computational Bayesian Statistics, vol. 644. Wiley, Hoboken (2010)MATH Bolstad, W.M.: Understanding Computational Bayesian Statistics, vol. 644. Wiley, Hoboken (2010)MATH
6.
Zurück zum Zitat Ding, Z., Fu, Y.: Low-rank common subspace for multi-view learning. In: 2014 IEEE International Conference on Data Mining, pp. 110–119. IEEE (2014) Ding, Z., Fu, Y.: Low-rank common subspace for multi-view learning. In: 2014 IEEE International Conference on Data Mining, pp. 110–119. IEEE (2014)
7.
Zurück zum Zitat Guo, Y.: Convex subspace representation learning from multi-view data. In: AAAI, vol. 1, p. 2 (2013) Guo, Y.: Convex subspace representation learning from multi-view data. In: AAAI, vol. 1, p. 2 (2013)
9.
Zurück zum Zitat Johnson, S.C.: Hierarchical clustering schemes. Psychometrika 32(3), 241–254 (1967)CrossRef Johnson, S.C.: Hierarchical clustering schemes. Psychometrika 32(3), 241–254 (1967)CrossRef
10.
Zurück zum Zitat Kohonen, T.: The self-organizing map. Proc. IEEE 78(9), 1464–1480 (1990)CrossRef Kohonen, T.: The self-organizing map. Proc. IEEE 78(9), 1464–1480 (1990)CrossRef
11.
Zurück zum Zitat Kumar, A., Daumé, H.: A co-training approach for multi-view spectral clustering. In: Proceedings of the 28th International Conference on Machine Learning (ICML 2011), pp. 393–400 (2011) Kumar, A., Daumé, H.: A co-training approach for multi-view spectral clustering. In: Proceedings of the 28th International Conference on Machine Learning (ICML 2011), pp. 393–400 (2011)
13.
Zurück zum Zitat Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 16–22. Citeseer (1999) Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 16–22. Citeseer (1999)
14.
Zurück zum Zitat Liu, J., Wang, C., Gao, J., Han, J.: Multi-view clustering via joint nonnegative matrix factorization. In: Proceedings of the 2013 SIAM International Conference on Data Mining, pp. 252–260. SIAM (2013) Liu, J., Wang, C., Gao, J., Han, J.: Multi-view clustering via joint nonnegative matrix factorization. In: Proceedings of the 2013 SIAM International Conference on Data Mining, pp. 252–260. SIAM (2013)
15.
Zurück zum Zitat MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 1, pp. 281–297 (1967) MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 1, pp. 281–297 (1967)
16.
Zurück zum Zitat Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:​1301.​3781 (2013)
17.
Zurück zum Zitat Nie, F., Cai, G., Li, X.: Multi-view clustering and semi-supervised classification with adaptive neighbours. In: AAAI, pp. 2408–2414 (2017) Nie, F., Cai, G., Li, X.: Multi-view clustering and semi-supervised classification with adaptive neighbours. In: AAAI, pp. 2408–2414 (2017)
18.
Zurück zum Zitat Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)CrossRef Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)CrossRef
19.
Zurück zum Zitat Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)CrossRef Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)CrossRef
21.
Zurück zum Zitat Shieh, S.L., Liao, I.E.: A new approach for data clustering and visualization using self-organizing maps. Expert Syst. Appl. 39(15), 11924–11933 (2012)CrossRef Shieh, S.L., Liao, I.E.: A new approach for data clustering and visualization using self-organizing maps. Expert Syst. Appl. 39(15), 11924–11933 (2012)CrossRef
22.
Zurück zum Zitat Yin, Q., Wu, S., He, R., Wang, L.: Multi-view clustering via pairwise sparse subspace representation. Neurocomputing 156, 12–21 (2015)CrossRef Yin, Q., Wu, S., He, R., Wang, L.: Multi-view clustering via pairwise sparse subspace representation. Neurocomputing 156, 12–21 (2015)CrossRef
23.
Zurück zum Zitat Yin, Q., Wu, S., Wang, L.: Unified subspace learning for incomplete and unlabeled multi-view data. Pattern Recogn. 67, 313–327 (2017)CrossRef Yin, Q., Wu, S., Wang, L.: Unified subspace learning for incomplete and unlabeled multi-view data. Pattern Recogn. 67, 313–327 (2017)CrossRef
24.
Zurück zum Zitat Zhai, C., Massung, S.: Text data management and analysis: a practical introduction to information retrieval and text mining. In: Association for Computing Machinery and Morgan & Claypool (2016) Zhai, C., Massung, S.: Text data management and analysis: a practical introduction to information retrieval and text mining. In: Association for Computing Machinery and Morgan & Claypool (2016)
25.
Zurück zum Zitat Zhang, G.Y., Wang, C.D., Huang, D., Zheng, W.S., Zhou, Y.R.: Tw-co-k-means: two-level weighted collaborative k-means for multi-view clustering. Knowl.-Based Syst. 150, 127–138 (2018)CrossRef Zhang, G.Y., Wang, C.D., Huang, D., Zheng, W.S., Zhou, Y.R.: Tw-co-k-means: two-level weighted collaborative k-means for multi-view clustering. Knowl.-Based Syst. 150, 127–138 (2018)CrossRef
26.
Zurück zum Zitat Zhao, H., Ding, Z., Fu, Y.: Multi-view clustering via deep matrix factorization. In: AAAI, pp. 2921–2927 (2017) Zhao, H., Ding, Z., Fu, Y.: Multi-view clustering via deep matrix factorization. In: AAAI, pp. 2921–2927 (2017)
27.
Zurück zum Zitat Zhao, L., Chen, Z., Yang, Y., Wang, Z.J., Leung, V.C.: Incomplete multi-view clustering via deep semantic mapping. Neurocomputing 275, 1053–1062 (2018)CrossRef Zhao, L., Chen, Z., Yang, Y., Wang, Z.J., Leung, V.C.: Incomplete multi-view clustering via deep semantic mapping. Neurocomputing 275, 1053–1062 (2018)CrossRef
28.
Zurück zum Zitat Zhao, X., Evans, N., Dugelay, J.L.: A subspace co-training framework for multi-view clustering. Pattern Recogn. Lett. 41, 73–82 (2014)CrossRef Zhao, X., Evans, N., Dugelay, J.L.: A subspace co-training framework for multi-view clustering. Pattern Recogn. Lett. 41, 73–82 (2014)CrossRef
29.
Zurück zum Zitat Zhuang, F., Karypis, G., Ning, X., He, Q., Shi, Z.: Multi-view learning via probabilistic latent semantic analysis. Inf. Sci. 199, 20–30 (2012)CrossRef Zhuang, F., Karypis, G., Ning, X., He, Q., Shi, Z.: Multi-view learning via probabilistic latent semantic analysis. Inf. Sci. 199, 20–30 (2012)CrossRef
30.
Zurück zum Zitat Zong, L., Zhang, X., Zhao, L., Yu, H., Zhao, Q.: Multi-view clustering via multi-manifold regularized non-negative matrix factorization. Neural Networks 88, 74–89 (2017)CrossRef Zong, L., Zhang, X., Zhao, L., Yu, H., Zhao, Q.: Multi-view clustering via multi-manifold regularized non-negative matrix factorization. Neural Networks 88, 74–89 (2017)CrossRef
Metadaten
Titel
Self-Organizing Map for Multi-view Text Clustering
verfasst von
Maha Fraj
Mohamed Aymen Ben Hajkacem
Nadia Essoussi
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-59065-9_30