Skip to main content

2015 | OriginalPaper | Buchkapitel

Symbolic Representation of Text Documents Using Multiple Kernel FCM

verfasst von : B. S. Harish, M. B. Revanasiddappa, S. V. Aruna Kumar

Erschienen in: Mining Intelligence and Knowledge Exploration

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we proposed a novel method of representing text documents based on clustering of term frequency vector. In order to cluster the term frequency vectors, we make use of Multiple Kernel Fuzzy C-Means (MKFCM). After clustering, term frequency vector of each cluster are used to form a interval valued representation (symbolic representation) by the use of mean and standard deviation. Further, interval value features are stored in knowledge base as a representative of the cluster. To corroborate the efficacy of the proposed model, we conducted extensive experimentation on standard datset like Reuters-21578 and 20 Newsgroup. We have compared our classification accuracy achieved by the Symbolic classifier with the other existing Naive Bayes classifier, KNN classifier and SVM classifier. The experimental result reveals that the classification accuracy achieved by using symbolic classifier is better than other three classifiers.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Nedungadi, P., Harikumar, H., Ramesh, M.: A high performance hybrid algorithm for text classification. In: 2014 Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT), pp. 118–123. IEEE (2014) Nedungadi, P., Harikumar, H., Ramesh, M.: A high performance hybrid algorithm for text classification. In: 2014 Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT), pp. 118–123. IEEE (2014)
2.
Zurück zum Zitat Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)CrossRef Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)CrossRef
3.
Zurück zum Zitat Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)MATHCrossRef Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)MATHCrossRef
4.
Zurück zum Zitat Choudhary, B., Bhattacharyya, P.: Text clustering using universal networking language representation. In: The Proceedings of Eleventh International World Wide Web Conference, pp. 1–7 (2002) Choudhary, B., Bhattacharyya, P.: Text clustering using universal networking language representation. In: The Proceedings of Eleventh International World Wide Web Conference, pp. 1–7 (2002)
5.
Zurück zum Zitat Hotho, A., Maedche, A., Staab, S.: Ontology-based text document clustering 16, 48–54 (2002) Hotho, A., Maedche, A., Staab, S.: Ontology-based text document clustering 16, 48–54 (2002)
6.
Zurück zum Zitat Cavnar, W.: Using an n-gram-based document representation with a vector processing retrieval model, pp. 269–269. NIST SPECIAL PUBLICATION SP (1995) Cavnar, W.: Using an n-gram-based document representation with a vector processing retrieval model, pp. 269–269. NIST SPECIAL PUBLICATION SP (1995)
7.
Zurück zum Zitat Milios, E., Zhang, Y., He, B., Dong, L.: Automatic term extraction and document similarity in special text corpora. In: Proceedings of the Sixth Conference of the Pacific Association for Computational Linguistics, pp. 275–284. Citeseer (2003) Milios, E., Zhang, Y., He, B., Dong, L.: Automatic term extraction and document similarity in special text corpora. In: Proceedings of the Sixth Conference of the Pacific Association for Computational Linguistics, pp. 275–284. Citeseer (2003)
8.
Zurück zum Zitat Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JAsIs 41(6), 391–407 (1990)CrossRef Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JAsIs 41(6), 391–407 (1990)CrossRef
9.
Zurück zum Zitat He, X., Cai, D., Liu, H., Ma, W.Y.: Locality preserving indexing for document representation. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 96–103. ACM (2004) He, X., Cai, D., Liu, H., Ma, W.Y.: Locality preserving indexing for document representation. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 96–103. ACM (2004)
10.
Zurück zum Zitat Cai, D., He, X., Zhang, W.V., Han, J.: Regularized locality preserving indexing via spectral regression. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 741–750. ACM (2007) Cai, D., He, X., Zhang, W.V., Han, J.: Regularized locality preserving indexing via spectral regression. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 741–750. ACM (2007)
11.
Zurück zum Zitat Baker, L.D., McCallum, A.K.: Distributional clustering of words for text classification. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 96–103. ACM (1998) Baker, L.D., McCallum, A.K.: Distributional clustering of words for text classification. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 96–103. ACM (1998)
12.
Zurück zum Zitat Bekkerman, R., El-Yaniv, R., Tishby, N., Winter, Y.: Distributional word clusters vs. words for text categorization. J. Mach. Learn. Res. 3, 1183–1208 (2003)MATH Bekkerman, R., El-Yaniv, R., Tishby, N., Winter, Y.: Distributional word clusters vs. words for text categorization. J. Mach. Learn. Res. 3, 1183–1208 (2003)MATH
13.
Zurück zum Zitat Dhillon, I.S., Mallela, S., Kumar, R.: A divisive information theoretic feature clustering algorithm for text classification. J. Mach. Learn. Res. 3, 1265–1287 (2003)MATHMathSciNet Dhillon, I.S., Mallela, S., Kumar, R.: A divisive information theoretic feature clustering algorithm for text classification. J. Mach. Learn. Res. 3, 1265–1287 (2003)MATHMathSciNet
15.
Zurück zum Zitat Anilkumarreddy, T., Madhukumar, B., Chandrakumar, K.: Classification of text using fuzzy based incremental feature clustering algorithm. Int. J. Adv. Res. Comput. Eng. Technol. 1(5), 313–318 (2012) Anilkumarreddy, T., Madhukumar, B., Chandrakumar, K.: Classification of text using fuzzy based incremental feature clustering algorithm. Int. J. Adv. Res. Comput. Eng. Technol. 1(5), 313–318 (2012)
16.
Zurück zum Zitat Jiang, J.Y., Liou, R.J., Lee, S.J.: A fuzzy self-constructing feature clustering algorithm for text classification. IEEE Trans. Knowl. Data Eng. 23(3), 335–349 (2011)CrossRef Jiang, J.Y., Liou, R.J., Lee, S.J.: A fuzzy self-constructing feature clustering algorithm for text classification. IEEE Trans. Knowl. Data Eng. 23(3), 335–349 (2011)CrossRef
17.
Zurück zum Zitat Puri, S.: A fuzzy similarity based concept mining model for text classification. Int. J. Adv. Comput. Sci. Appl. 2(11), 115–121 (2012) Puri, S.: A fuzzy similarity based concept mining model for text classification. Int. J. Adv. Comput. Sci. Appl. 2(11), 115–121 (2012)
18.
Zurück zum Zitat Carvalho, F.D.A.: Fuzzy c-means clustering methods for symbolic interval data. Pattern Recogn. Lett. 28(4), 423–437 (2007)CrossRef Carvalho, F.D.A.: Fuzzy c-means clustering methods for symbolic interval data. Pattern Recogn. Lett. 28(4), 423–437 (2007)CrossRef
19.
Zurück zum Zitat Guru, D.S., Harish, B.S., Manjunath, S.: Symbolic representation of text documents. In: Proceedings of the Third Annual ACM Bangalore Conference, pp. 1–8. ACM (2010) Guru, D.S., Harish, B.S., Manjunath, S.: Symbolic representation of text documents. In: Proceedings of the Third Annual ACM Bangalore Conference, pp. 1–8. ACM (2010)
20.
Zurück zum Zitat Harish, B.S., Prasad, B., Udayasri, B.: Classification of text documents using adaptive fuzzy c-means clustering. In: Thampi, S.M., Abraham, A., Pal, S.K., Rodriguez, J.M.C. (eds.) Recent Advances in Intelligent Informatics. AISC, vol. 235, pp. 205–214. Springer, Heidelberg (2014) CrossRef Harish, B.S., Prasad, B., Udayasri, B.: Classification of text documents using adaptive fuzzy c-means clustering. In: Thampi, S.M., Abraham, A., Pal, S.K., Rodriguez, J.M.C. (eds.) Recent Advances in Intelligent Informatics. AISC, vol. 235, pp. 205–214. Springer, Heidelberg (2014) CrossRef
21.
Zurück zum Zitat Müller, K.R., Mika, S., Rätsch, G., Tsuda, K., Schölkopf, B.: An introduction to kernel-based learning algorithms. IEEE Trans. Neural Netw. 12(2), 181–201 (2001)CrossRef Müller, K.R., Mika, S., Rätsch, G., Tsuda, K., Schölkopf, B.: An introduction to kernel-based learning algorithms. IEEE Trans. Neural Netw. 12(2), 181–201 (2001)CrossRef
22.
Zurück zum Zitat Huang, H.C., Chuang, Y.Y., Chen, C.S.: Multiple kernel fuzzy clustering. IEEE Trans. Fuzzy Syst. 20(1), 120–134 (2012)CrossRef Huang, H.C., Chuang, Y.Y., Chen, C.S.: Multiple kernel fuzzy clustering. IEEE Trans. Fuzzy Syst. 20(1), 120–134 (2012)CrossRef
Metadaten
Titel
Symbolic Representation of Text Documents Using Multiple Kernel FCM
verfasst von
B. S. Harish
M. B. Revanasiddappa
S. V. Aruna Kumar
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-26832-3_10