Skip to main content
Top

2018 | OriginalPaper | Chapter

The Accuracy of Fuzzy C-Means in Lower-Dimensional Space for Topic Detection

Author : Hendri Murfi

Published in: Smart Computing and Communication

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Topic detection is an automatic method to discover topics in textual data. The standard methods of the topic detection are nonnegative matrix factorization (NMF) and latent Dirichlet allocation (LDA). Another alternative method is a clustering approach such as a k-means and fuzzy c-means (FCM). FCM extend the k-means method in the sense that the textual data may have more than one topic. However, FCM works well for low-dimensional textual data and fails for high-dimensional textual data. An approach to overcome the problem is transforming the textual data into lower dimensional space, i.e., Eigenspace, and called Eigenspace-based FCM (EFCM). Firstly, the textual data are transformed into an Eigenspace using truncated singular value decomposition. FCM is performed on the eigenspace data to identify the memberships of the textual data in clusters. Using these memberships, we generate topics from the high dimensional textual data in the original space. In this paper, we examine the accuracy of EFCM for topic detection. Our simulations show that EFCM results in the accuracies between the accuracies of LDA and NMF regarding both topic interpretation and topic recall.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Lee, D.D., Seung, H.S.: Learning the parts of objects by nonnegative matrix factorization. Nature 401, 788–791 (1999)CrossRef Lee, D.D., Seung, H.S.: Learning the parts of objects by nonnegative matrix factorization. Nature 401, 788–791 (1999)CrossRef
2.
go back to reference Cichocki, A., Phan, A.H.: Fast local algorithms for large scale nonnegative matrix and tensor factorizations. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. E92–A, 708–721 (2009)CrossRef Cichocki, A., Phan, A.H.: Fast local algorithms for large scale nonnegative matrix and tensor factorizations. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. E92–A, 708–721 (2009)CrossRef
3.
go back to reference Févotte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the β-divergence. Neural Comput. 23, 2421–2456 (2011)MathSciNetCrossRef Févotte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the β-divergence. Neural Comput. 23, 2421–2456 (2011)MathSciNetCrossRef
4.
go back to reference Blei, D.M., et al.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH Blei, D.M., et al.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
5.
go back to reference Hoffman, M.D., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference. J. Mach. Learn. Res. 14, 1303–1347 (2013)MathSciNetMATH Hoffman, M.D., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference. J. Mach. Learn. Res. 14, 1303–1347 (2013)MathSciNetMATH
6.
go back to reference Hoffman, M.D., Blei, D.M., Bach, F.: Online learning for latent Dirichlet allocation. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems, vol. 1, pp. 856–864. Curran Associates Inc., USA (2010) Hoffman, M.D., Blei, D.M., Bach, F.: Online learning for latent Dirichlet allocation. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems, vol. 1, pp. 856–864. Curran Associates Inc., USA (2010)
7.
go back to reference Papadimitriou, C.H., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing: a probabilistic analysis. In: Proceedings of the ACM Symposium on Principles of Database Systems, pp. 217–235 (1998) Papadimitriou, C.H., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing: a probabilistic analysis. In: Proceedings of the ACM Symposium on Principles of Database Systems, pp. 217–235 (1998)
8.
go back to reference Hofmann, T.: Probabilistic latent semantic analysis. In: Uncertainty in Artificial Intelligence, pp. 289–296 (1999) Hofmann, T.: Probabilistic latent semantic analysis. In: Uncertainty in Artificial Intelligence, pp. 289–296 (1999)
9.
go back to reference Allan, J.: Topic Detection and Tracking: Event-Based Information Organization. Kluwer (2002) Allan, J.: Topic Detection and Tracking: Event-Based Information Organization. Kluwer (2002)
10.
go back to reference Petkos, G., Papadopoulos, S., Kompatsiaris, Y.: Two-level message clustering for topic detection in Twitter. In: CEUR Workshop Proceedings, vol. 1150, pp. 49–56 (2014) Petkos, G., Papadopoulos, S., Kompatsiaris, Y.: Two-level message clustering for topic detection in Twitter. In: CEUR Workshop Proceedings, vol. 1150, pp. 49–56 (2014)
11.
go back to reference Nur’Aini, K., Najahaty, I., Hidayati, L., Murfi, H., Nurrohmah, S.: Combination of singular value decomposition and K-means clustering methods for topic detection on Twitter. In: 2015 International Conference on Advanced Computer Science and Information Systems, Proceedings, ICACSIS 2015 (2016) Nur’Aini, K., Najahaty, I., Hidayati, L., Murfi, H., Nurrohmah, S.: Combination of singular value decomposition and K-means clustering methods for topic detection on Twitter. In: 2015 International Conference on Advanced Computer Science and Information Systems, Proceedings, ICACSIS 2015 (2016)
12.
go back to reference Fitriyani, S.R., Murfi, H.: The K-means with mini batch algorithm for topics detection on online news. In: 2016 4th International Conference on Information and Communication Technology, ICoICT 2016 (2016) Fitriyani, S.R., Murfi, H.: The K-means with mini batch algorithm for topics detection on online news. In: 2016 4th International Conference on Information and Communication Technology, ICoICT 2016 (2016)
13.
go back to reference Alatas, H., Murfi, H., Bustamam, A.: Topic detection using fuzzy c-means with nonnegative double singular value decomposition initialization. Int. J. Adv. Soft Comput. its Appl. 10, 206–222 (2018) Alatas, H., Murfi, H., Bustamam, A.: Topic detection using fuzzy c-means with nonnegative double singular value decomposition initialization. Int. J. Adv. Soft Comput. its Appl. 10, 206–222 (2018)
14.
go back to reference Mursidah, I., Murfi, H.: Analysis of initialization method on fuzzy c-means algorithm based on singular value decomposition for topic detection. In: Proceedings of the 2017 1st International Conference on Informatics and Computational Sciences (ICICoS), pp. 213–218 (2017) Mursidah, I., Murfi, H.: Analysis of initialization method on fuzzy c-means algorithm based on singular value decomposition for topic detection. In: Proceedings of the 2017 1st International Conference on Informatics and Computational Sciences (ICICoS), pp. 213–218 (2017)
15.
go back to reference Winkler, R., Klawonn, F., Kruse, R.: Fuzzy c-means in high dimensional spaces. Int. J. Fuzzy Syst. Appl. 1, 2–4 (2011) Winkler, R., Klawonn, F., Kruse, R.: Fuzzy c-means in high dimensional spaces. Int. J. Fuzzy Syst. Appl. 1, 2–4 (2011)
16.
go back to reference Muliawati, T., Murfi, H.: Eigenspace-Based Fuzzy C-Means for Sensing Trending Topics in Twitter. In: AIP Conference Proceedings, vol. 1862, p. 030140 (2017) Muliawati, T., Murfi, H.: Eigenspace-Based Fuzzy C-Means for Sensing Trending Topics in Twitter. In: AIP Conference Proceedings, vol. 1862, p. 030140 (2017)
17.
go back to reference Golub, G., Loan, C.V: Matrix Computation. The Johns Hopkins University Press (1996) Golub, G., Loan, C.V: Matrix Computation. The Johns Hopkins University Press (1996)
18.
go back to reference Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New York (1981)CrossRef Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New York (1981)CrossRef
19.
go back to reference Bezdek, J.C., Hathaway, R.J.: Convergence of alternating optimization. Neural Parallel Sci. Comput. 11, 351–368 (2003)MathSciNetMATH Bezdek, J.C., Hathaway, R.J.: Convergence of alternating optimization. Neural Parallel Sci. Comput. 11, 351–368 (2003)MathSciNetMATH
20.
go back to reference Lau, J.H., Newman, D., Baldwin, T.: Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 530–539 (2014) Lau, J.H., Newman, D., Baldwin, T.: Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 530–539 (2014)
22.
go back to reference Manning, C.D., Schuetze, H., Raghavan, P.: Introduction to information retrieval. Cambridge University Press, Cambridge (2008)CrossRef Manning, C.D., Schuetze, H., Raghavan, P.: Introduction to information retrieval. Cambridge University Press, Cambridge (2008)CrossRef
23.
go back to reference Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetMATH Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetMATH
24.
go back to reference Aiello, L.M., et al.: Sensing trending topics in Twitter. IEEE Trans. Multimed. 15, 1268–1282 (2013)CrossRef Aiello, L.M., et al.: Sensing trending topics in Twitter. IEEE Trans. Multimed. 15, 1268–1282 (2013)CrossRef
25.
go back to reference Sitorus, A.P., Murfi, H., Nurrohmah, S., Akbar, A.: Sensing trending topics in Twitter for Greater Jakarta area. Int. J. Electr. Comput. Eng. 7, 330–336 (2017) Sitorus, A.P., Murfi, H., Nurrohmah, S., Akbar, A.: Sensing trending topics in Twitter for Greater Jakarta area. Int. J. Electr. Comput. Eng. 7, 330–336 (2017)
26.
go back to reference Loper, E., Bird, S.: NLTK : the natural language toolkit. In: Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions, pp. 69–72 (2006) Loper, E., Bird, S.: NLTK : the natural language toolkit. In: Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions, pp. 69–72 (2006)
Metadata
Title
The Accuracy of Fuzzy C-Means in Lower-Dimensional Space for Topic Detection
Author
Hendri Murfi
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-030-05755-8_32

Premium Partner