Skip to main content

2018 | OriginalPaper | Buchkapitel

Turkish Document Classification with Coarse-Grained Semantic Matrix

verfasst von : İlknur Dönmez, Eşref Adalı

Erschienen in: Computational Linguistics and Intelligent Text Processing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we present a novel method for Document Classification that uses semantic matrix representation of Turkish sentences by concentrating on the sentence phrases and their concepts in text. Our model has been designed to find phrases in a sentence, identify their relations with specific concepts, and represent the sentences as coarse-grained semantic matrix. Predicate features and semantic class type are also added to the coarse-grained semantic matrix representation. The highest success rate in Turkish Document Classification “97.12” is obtained by adding the coarse-grained semantic matrix representation to the data which has previous highest result in the previous studies about Turkish Document Classification.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Amasyalı, M.F., Beken, A.: Türkçe kelimelerin anlamsal benzerliklerinin ölçülmesi ve metin sınıflandırmada kullanılması measurement of Turkish word semantic similarity and text categorization application Amasyalı, M.F., Beken, A.: Türkçe kelimelerin anlamsal benzerliklerinin ölçülmesi ve metin sınıflandırmada kullanılması measurement of Turkish word semantic similarity and text categorization application
3.
Zurück zum Zitat Backus, J.W.: The syntax and semantics of the proposed international algebraic language of the Zurich ACM-GAMM conference. In: 1959 Proceedings of the International Comference on Information Processing (1959) Backus, J.W.: The syntax and semantics of the proposed international algebraic language of the Zurich ACM-GAMM conference. In: 1959 Proceedings of the International Comference on Information Processing (1959)
4.
Zurück zum Zitat Baytop, T.: Türkçe bitki adları sözlüğü, vol. 578. Turk Dil Kurumu, Ankara (1994) Baytop, T.: Türkçe bitki adları sözlüğü, vol. 578. Turk Dil Kurumu, Ankara (1994)
5.
Zurück zum Zitat Bilgin, O., Çetinoğlu, Ö., Oflazer, K.: Building a wordnet for Turkish. Rom. J. Inf. Sci. Technol. 7(1–2), 163–172 (2004) Bilgin, O., Çetinoğlu, Ö., Oflazer, K.: Building a wordnet for Turkish. Rom. J. Inf. Sci. Technol. 7(1–2), 163–172 (2004)
6.
Zurück zum Zitat Çataltepe, Z., Turan, Y., Kesgin, F.: Turkish document classification using shorter roots. In: 2007 IEEE 15th Signal Processing and Communications Applications, SIU 2007, pp. 1–4. IEEE (2007) Çataltepe, Z., Turan, Y., Kesgin, F.: Turkish document classification using shorter roots. In: 2007 IEEE 15th Signal Processing and Communications Applications, SIU 2007, pp. 1–4. IEEE (2007)
8.
Zurück zum Zitat Eryigit, G.: ITU Turkish NLP web service. In: 2014 EACL, p. 1 (2014) Eryigit, G.: ITU Turkish NLP web service. In: 2014 EACL, p. 1 (2014)
9.
Zurück zum Zitat Fellbaum, C.: WordNet. Wiley Online Library, New York (1998)MATH Fellbaum, C.: WordNet. Wiley Online Library, New York (1998)MATH
10.
Zurück zum Zitat Hoffman, B.: The computational analysis of the syntax and interpretation of “free” word order in Turkish. IRCS Technical reports Series, p. 130 (1995) Hoffman, B.: The computational analysis of the syntax and interpretation of “free” word order in Turkish. IRCS Technical reports Series, p. 130 (1995)
13.
Zurück zum Zitat Lakoff, G.: Women, Fire, and Dangerous Things: What Categories Reveal About the Mind, vol. 1. Cambridge University Press, Cambridge (1990) Lakoff, G.: Women, Fire, and Dangerous Things: What Categories Reveal About the Mind, vol. 1. Cambridge University Press, Cambridge (1990)
16.
Zurück zum Zitat Marelli, M., Bentivogli, L., Baroni, M., Bernardi, R., Menini, S., Zamparelli, R.: Semeval-2014 task 1: evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In: SemEval-2014 (2014) Marelli, M., Bentivogli, L., Baroni, M., Bernardi, R., Menini, S., Zamparelli, R.: Semeval-2014 task 1: evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In: SemEval-2014 (2014)
17.
Zurück zum Zitat Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:​1301.​3781 (2013)
18.
Zurück zum Zitat Nakayama, M., Shimizu, Y.: Subject categorization for web educational resources using MLP. In: ESANN, pp. 9–14 (2003) Nakayama, M., Shimizu, Y.: Subject categorization for web educational resources using MLP. In: ESANN, pp. 9–14 (2003)
19.
Zurück zum Zitat Schuler, K.K.: Verbnet: a broad-coverage, comprehensive verb lexicon (2005) Schuler, K.K.: Verbnet: a broad-coverage, comprehensive verb lexicon (2005)
20.
Zurück zum Zitat Socher, R., Huval, B., Manning, C.D., Ng, A.Y.: Semantic compositionality through recursive matrix-vector spaces. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1201–1211. Association for Computational Linguistics (2012) Socher, R., Huval, B., Manning, C.D., Ng, A.Y.: Semantic compositionality through recursive matrix-vector spaces. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1201–1211. Association for Computational Linguistics (2012)
21.
Zurück zum Zitat Stamou, S., Oflazer, K., Pala, K., Christoudoulakis, D., Cristea, D., Tufis, D., Koeva, S., Totkov, G., Dutoit, D., Grigoriadou, M.: BALKANET: a multilingual semantic network for the Balkan languages. In: Proceedings of the International Wordnet Conference, Mysore, India, pp. 21–25 (2002) Stamou, S., Oflazer, K., Pala, K., Christoudoulakis, D., Cristea, D., Tufis, D., Koeva, S., Totkov, G., Dutoit, D., Grigoriadou, M.: BALKANET: a multilingual semantic network for the Balkan languages. In: Proceedings of the International Wordnet Conference, Mysore, India, pp. 21–25 (2002)
22.
Zurück zum Zitat Tan, S.: An effective refinement strategy for KNN text classifier. Expert Syst. Appl. 30(2), 290–298 (2006)CrossRef Tan, S.: An effective refinement strategy for KNN text classifier. Expert Syst. Appl. 30(2), 290–298 (2006)CrossRef
23.
Zurück zum Zitat Torunoğlu, D., Çakırman, E., Ganiz, M.C., Akyokuş, S., Gürbüz, M.Z.: Analysis of preprocessing methods on classification of Turkish texts. In: 2011 International Symposium on Innovations in Intelligent Systems and Applications (INISTA), pp. 112–117. IEEE (2011) Torunoğlu, D., Çakırman, E., Ganiz, M.C., Akyokuş, S., Gürbüz, M.Z.: Analysis of preprocessing methods on classification of Turkish texts. In: 2011 International Symposium on Innovations in Intelligent Systems and Applications (INISTA), pp. 112–117. IEEE (2011)
24.
Zurück zum Zitat Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 104. ACM (2004) Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 104. ACM (2004)
25.
Zurück zum Zitat Wu, M.C., Lin, S.Y., Lin, C.H.: An effective application of decision tree to stock trading. Expert Syst. Appl. 31(2), 270–274 (2006)CrossRef Wu, M.C., Lin, S.Y., Lin, C.H.: An effective application of decision tree to stock trading. Expert Syst. Appl. 31(2), 270–274 (2006)CrossRef
26.
Zurück zum Zitat Yang, Y., Chute, C.G.: An example-based mapping method for text categorization and retrieval. ACM Trans. Inf. Syst. (TOIS) 12(3), 252–277 (1994)CrossRef Yang, Y., Chute, C.G.: An example-based mapping method for text categorization and retrieval. ACM Trans. Inf. Syst. (TOIS) 12(3), 252–277 (1994)CrossRef
27.
Zurück zum Zitat Yıldız, H., Gençtav, M., Usta, N., Diri, B., Amasyalı, M.: Metin sınıflandırmada yeni özellik çıkarımı. In: IEEE SIU 2007 15 Sinyal İşleme, İletişim ve Uygulamaları Kurultayı (2007) Yıldız, H., Gençtav, M., Usta, N., Diri, B., Amasyalı, M.: Metin sınıflandırmada yeni özellik çıkarımı. In: IEEE SIU 2007 15 Sinyal İşleme, İletişim ve Uygulamaları Kurultayı (2007)
Metadaten
Titel
Turkish Document Classification with Coarse-Grained Semantic Matrix
verfasst von
İlknur Dönmez
Eşref Adalı
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-75487-1_37

Premium Partner