Skip to main content
Top
Published in: Soft Computing 4/2015

01-04-2015 | Focus

Normalized table-matching algorithm as approach to text categorization

Author: Taeho Jo

Published in: Soft Computing | Issue 4/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This research is concerned with the improved version of table-based matching algorithm as the approach to text categorization tasks. It is intended to tackle the three problems in encoding texts into numerical vectors and the unstable performance by the fluctuations from text lengths in the previous version. In this research, we encode texts into tables rather than into numerical vectors, define the similarity measure between two tables which is always as a normalized value between zero and one, and apply it to the tasks of text categorization. As the benefits from this research, we expect better performance by solving the three problems resulting from encoding texts into numerical vectors, and more stable performance by improving the previous version. Therefore, we empirically validate the proposed approach through the four sets of experiments, with respect to both performance and stability.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27CrossRefMATH Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27CrossRefMATH
go back to reference Cristianini N, Shawe-Taylor J (2000) Support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge, UKCrossRef Cristianini N, Shawe-Taylor J (2000) Support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge, UKCrossRef
go back to reference Drucker H, Wu D, Vapnik VN (1999) Support vector machines for spam categorization. IEEE Trans Neural Netw 10(5):1048–1054CrossRef Drucker H, Wu D, Vapnik VN (1999) Support vector machines for spam categorization. IEEE Trans Neural Netw 10(5):1048–1054CrossRef
go back to reference Eyheramendy S, Lewis D, Madigan D (2003) On the Naive Bayes model for text categorization. In: The Proceedings of the 9th international workshop on artificial intelligence and statistics, pp 165–171 Eyheramendy S, Lewis D, Madigan D (2003) On the Naive Bayes model for text categorization. In: The Proceedings of the 9th international workshop on artificial intelligence and statistics, pp 165–171
go back to reference Hearst M (1998) Support vector machines. IEEE Intell Syst 13(4):18–28 Hearst M (1998) Support vector machines. IEEE Intell Syst 13(4):18–28
go back to reference Jo T (2000) NeuroTextCategorizer: a new model of neural network for text categorization. In: The Proceedings of ICONIP 2000, pp 280–285 Jo T (2000) NeuroTextCategorizer: a new model of neural network for text categorization. In: The Proceedings of ICONIP 2000, pp 280–285
go back to reference Jo T (2004) Machine learning based approach to text categorization with resampling methods. In: The Proceedings of the 8th world multi-conference on systemics, cybernetics and informatics, pp 93–98 Jo T (2004) Machine learning based approach to text categorization with resampling methods. In: The Proceedings of the 8th world multi-conference on systemics, cybernetics and informatics, pp 93–98
go back to reference Jo T, Lee M (2007) Mistaken driven and unconditional learning of NTC. Lect Notes Comput Sci 4491:1205–1214 Jo T, Lee M (2007) Mistaken driven and unconditional learning of NTC. Lect Notes Comput Sci 4491:1205–1214
go back to reference Jo T, Cho D (2008) Index based approach for text categorization. Int J Math Comput Simul 2(1):127–132 Jo T, Cho D (2008) Index based approach for text categorization. Int J Math Comput Simul 2(1):127–132
go back to reference Jo T (2008) Table based matching algorithm for soft categorization of news articles in Reuter 21578. J Korea Multimed Soc 11(6):875– 882 Jo T (2008) Table based matching algorithm for soft categorization of news articles in Reuter 21578. J Korea Multimed Soc 11(6):875– 882
go back to reference Jo T (2008) Single pass algorithm for text clustering by encoding documents into tables. J Korea Multimed Soc 11(12):1749–1757 Jo T (2008) Single pass algorithm for text clustering by encoding documents into tables. J Korea Multimed Soc 11(12):1749–1757
go back to reference Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: The Proceedings of 10th European conference on machine learning, pp 143–151 Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: The Proceedings of 10th European conference on machine learning, pp 143–151
go back to reference Jo T, Seo J (2001) ’Text categorization oriented connectionist model. In: The Proceedings of ICCPOL 2001, pp 65–68 Jo T, Seo J (2001) ’Text categorization oriented connectionist model. In: The Proceedings of ICCPOL 2001, pp 65–68
go back to reference Kononenko I (1989) ID3, sequential Bayes, naive Bayes and Bayesian neural networks. In: The Proceedings of 4th European working session on learning, Montpellier, pp 91–98 Kononenko I (1989) ID3, sequential Bayes, naive Bayes and Bayesian neural networks. In: The Proceedings of 4th European working session on learning, Montpellier, pp 91–98
go back to reference Lee K, Kageura K (2007) Virtual relevant documents in text categorization with support vector machines. Inf Process Manag 43(4):902– 913 Lee K, Kageura K (2007) Virtual relevant documents in text categorization with support vector machines. Inf Process Manag 43(4):902– 913
go back to reference Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C (2002) Text classification with string kernels. J Mach Learn Res 2(2):419–444MATH Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C (2002) Text classification with string kernels. J Mach Learn Res 2(2):419–444MATH
go back to reference Massand B, Linoff G, Waltz D (1992) Classifying news stories using memory based reasoning. In: The Proceedings of 15th ACM international conference on research and development in information retrieval, pp 59–65 Massand B, Linoff G, Waltz D (1992) Classifying news stories using memory based reasoning. In: The Proceedings of 15th ACM international conference on research and development in information retrieval, pp 59–65
go back to reference McClelland J, Rumelhart D (1986) Parallel distributed processing, vol 1 and 2. MIT Press, Cambridge, MA, USA McClelland J, Rumelhart D (1986) Parallel distributed processing, vol 1 and 2. MIT Press, Cambridge, MA, USA
go back to reference Mitchell TM (1997) Machine learning. McGraw-Hill, SingaporeMATH Mitchell TM (1997) Machine learning. McGraw-Hill, SingaporeMATH
go back to reference Mladenic D, Grobelink M (1999) Feature selection for unbalanced class distribution and Naive Bayes. In: The Proceedings of international conference on machine learning, pp 256–267 Mladenic D, Grobelink M (1999) Feature selection for unbalanced class distribution and Naive Bayes. In: The Proceedings of international conference on machine learning, pp 256–267
go back to reference Peters C, Koster CHA (2002) Uncertainty-based noise reduction and term selection in text categorization. Lect Note Comput Sci 2291:248–267CrossRef Peters C, Koster CHA (2002) Uncertainty-based noise reduction and term selection in text categorization. Lect Note Comput Sci 2291:248–267CrossRef
go back to reference Ruiz ME, Srinivasan P (2002) Hierarchical text categorization using neural networks. Inf Retr 5(1):87–118CrossRefMATH Ruiz ME, Srinivasan P (2002) Hierarchical text categorization using neural networks. Inf Retr 5(1):87–118CrossRefMATH
go back to reference Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47CrossRef Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47CrossRef
go back to reference Snchez SN, Triantaphyllou E, Kraft D (2002) A feature mining based approach for the classification of text documents into disjoint classes. Inf Process Manag 38(4):583–604CrossRef Snchez SN, Triantaphyllou E, Kraft D (2002) A feature mining based approach for the classification of text documents into disjoint classes. Inf Process Manag 38(4):583–604CrossRef
go back to reference Tai X, Ren F, Kita K (2002) An information retrieval model based on vector space method by supervised learning. Inf Process Manag 38(6):749–764CrossRefMATH Tai X, Ren F, Kita K (2002) An information retrieval model based on vector space method by supervised learning. Inf Process Manag 38(6):749–764CrossRefMATH
go back to reference Wang C, Wang W (2005) Using term clustering and supervised term affinity construction to boost text classification. Lect Note Comput Sci 3518:813–819CrossRef Wang C, Wang W (2005) Using term clustering and supervised term affinity construction to boost text classification. Lect Note Comput Sci 3518:813–819CrossRef
go back to reference Wiener ED (1995) A neural network approach to topic spotting in text. The Thesis of Master of University of Colorado Wiener ED (1995) A neural network approach to topic spotting in text. The Thesis of Master of University of Colorado
go back to reference Yang Y (1999) An evaluation of statistical approaches to text categorization. Inf Retr 1(1–2):67–88 Yang Y (1999) An evaluation of statistical approaches to text categorization. Inf Retr 1(1–2):67–88
Metadata
Title
Normalized table-matching algorithm as approach to text categorization
Author
Taeho Jo
Publication date
01-04-2015
Publisher
Springer Berlin Heidelberg
Published in
Soft Computing / Issue 4/2015
Print ISSN: 1432-7643
Electronic ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-014-1411-9

Other articles of this Issue 4/2015

Soft Computing 4/2015 Go to the issue

Premium Partner