Skip to main content
Top

2018 | OriginalPaper | Chapter

Internet Articles Classification by Industry Types Based on TF-IDF

Authors : Jonghun Cha, Jee-Hyong Lee

Published in: Advances in Computer Science and Ubiquitous Computing

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In order to understand a specific industry field, people usually look at the financial statements of the companies relevant to the industry field. Financial statements have diverse and numerical information but have past financial states of companies because those are usually quarterly reported. So, needs to timely obtain the current states of an industry field is increasing. Proposed method is focusing on internet articles because they are easy to obtain and updated with new information every day. As a preliminary study of extracting information on industries from internet articles, this paper proposes a method to classify internet articles by industry types. The proposed method in this paper computes importance values of nouns in internet articles based on TF-IDF. Using calculated importance values, proposed method classifies articles by industry types. Through experiments, it is proven that proposed method can achieve high accuracy in industry article classification.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Aizawa, A.: An information-theoretic perspective of TF-IDF measures. Inf. Process. Manag. 39(1), 45–65 (2003). National Institute of InformaticsMathSciNetCrossRef Aizawa, A.: An information-theoretic perspective of TF-IDF measures. Inf. Process. Manag. 39(1), 45–65 (2003). National Institute of InformaticsMathSciNetCrossRef
2.
go back to reference Wu, H.C., Luk, R.W.P., Wong, K.F., Kwok, K.L.: Interpreting TF-IDF term weights as making relevance decisions. ACM Trans. Inf. Syst. (TOIS) 26(3), 13–37 (2008)CrossRef Wu, H.C., Luk, R.W.P., Wong, K.F., Kwok, K.L.: Interpreting TF-IDF term weights as making relevance decisions. ACM Trans. Inf. Syst. (TOIS) 26(3), 13–37 (2008)CrossRef
3.
go back to reference Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: A framework and graphical development environment for robust NLP tools and applications. In: 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 168–175 (2002) Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: A framework and graphical development environment for robust NLP tools and applications. In: 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 168–175 (2002)
4.
go back to reference Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: International Conference on Computational Linguistics and Intelligent Text Processing (CICLing). LNCS, vol. 2276, pp. 1–15 (2002) Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: International Conference on Computational Linguistics and Intelligent Text Processing (CICLing). LNCS, vol. 2276, pp. 1–15 (2002)
5.
go back to reference Shim, K., Yang, J.: MACH: a supersonic Korean morphological analyzer. In: COLING 2002 Proceedings of the 19th International Conference on Computational Linguistics, vol. 1, pp. 1–7 (2002) Shim, K., Yang, J.: MACH: a supersonic Korean morphological analyzer. In: COLING 2002 Proceedings of the 19th International Conference on Computational Linguistics, vol. 1, pp. 1–7 (2002)
6.
go back to reference Kim, N., Kim, S., Lee, J.: Identifying relations between documents. In: The 11th Asia Pacific International Conference on Information Science and Technology on Information Science and Technology (APIC-IST), pp. 215–217 (2016) Kim, N., Kim, S., Lee, J.: Identifying relations between documents. In: The 11th Asia Pacific International Conference on Information Science and Technology on Information Science and Technology (APIC-IST), pp. 215–217 (2016)
7.
go back to reference Lee, J., Kim, H., Kim, N., Lee, J.: An approach for multi-label classification by directed acyclic graph with label correlation maximization. Inf. Sci. 351, 101–114 (2016). Informatics and Computer Science Intelligent Systems ApplicationsCrossRef Lee, J., Kim, H., Kim, N., Lee, J.: An approach for multi-label classification by directed acyclic graph with label correlation maximization. Inf. Sci. 351, 101–114 (2016). Informatics and Computer Science Intelligent Systems ApplicationsCrossRef
Metadata
Title
Internet Articles Classification by Industry Types Based on TF-IDF
Authors
Jonghun Cha
Jee-Hyong Lee
Copyright Year
2018
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-7605-3_179