Skip to main content

2018 | OriginalPaper | Buchkapitel

Graph-Based Text Modeling: Considering Mathematical Semantic Linking to Improve the Indexation of Arabic Documents

verfasst von : Mohamed Salim El Bazzi, Driss Mammass, Taher Zaki, Abdelatif Ennaji

Erschienen in: Image and Signal Processing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Indexing unstructured documents aims to build a list of words, or concepts, which will simplify the exploration of their exploration later on. The most used model for text modeling is the Vector Space Model. In spite of the simplicity of this model in its implementation and its wide use in different researches in the field of text mining and information retrieval, it has an important limit, which is ignoring the semantic relation between the different textual units, by considering them as independent. However, there is a more suitable technique in Data Mining to highlight the semantic linkage between text units, which is the graph-based representation. A graph can easily be adapted to the textual data by representing words as a vertex and the relation between them as edges. In this work, we have introduced the graph based modeling of textual document. Thus, we conducted a study about the impact of the choice of the semantic relation between the text units on the indexation of documents. We have validated our results through classification results.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Zaki, T.: Indexation par le contenu et archivage de fonds documentaires arabes. Thesis. Ibn Zohr University, Agadir, Morocco (2013) Zaki, T.: Indexation par le contenu et archivage de fonds documentaires arabes. Thesis. Ibn Zohr University, Agadir, Morocco (2013)
2.
Zurück zum Zitat Thabtah F., Hadi, W., Al-shammare, G.: VSMs with K-nearestneighbour to categorise Arabic text data. In: Proceedings of The World Congress on Engineering and Computer Science, WCECS 2008, pp. 778–781 (2008) Thabtah F., Hadi, W., Al-shammare, G.: VSMs with K-nearestneighbour to categorise Arabic text data. In: Proceedings of The World Congress on Engineering and Computer Science, WCECS 2008, pp. 778–781 (2008)
3.
Zurück zum Zitat Mohamed, R., Watada, J.: An evidential reasoning basedlsa approach to document classification for knowledge acquisition. In: Proceedings of the IEEE International Conference on Industrial Engineering and Engineering Management, IEEM 2010, pp. 1092–1096. Institute of Electrical and Electronics Engineers (IEEE) (2010) Mohamed, R., Watada, J.: An evidential reasoning basedlsa approach to document classification for knowledge acquisition. In: Proceedings of the IEEE International Conference on Industrial Engineering and Engineering Management, IEEM 2010, pp. 1092–1096. Institute of Electrical and Electronics Engineers (IEEE) (2010)
5.
Zurück zum Zitat Al-Shalabi, R., Obeidat, R.: Improving KNN arabic text classification with n-grams based document indexing. In: Proceedings of the Sixth International Conference on Informatics and Systems, INFOS q 2008, pp. 108–112 (2008) Al-Shalabi, R., Obeidat, R.: Improving KNN arabic text classification with n-grams based document indexing. In: Proceedings of the Sixth International Conference on Informatics and Systems, INFOS q 2008, pp. 108–112 (2008)
6.
Zurück zum Zitat Jamoussi, S.: Une nouvelle représentation vectorielle pour la classification sémantique. TAL 2009, vol. 50 (2009) Jamoussi, S.: Une nouvelle représentation vectorielle pour la classification sémantique. TAL 2009, vol. 50 (2009)
7.
Zurück zum Zitat Zaki, T., Mammass, D., Ennaji, A., Nicolas, S.: A kernel hybridization NGram-Okapi for indexing and classification of Arabic documents. J. Inf. Comput. Sci. 9(2), 141–153 (2014). ISSN 1746-7659, England, UK Zaki, T., Mammass, D., Ennaji, A., Nicolas, S.: A kernel hybridization NGram-Okapi for indexing and classification of Arabic documents. J. Inf. Comput. Sci. 9(2), 141–153 (2014). ISSN 1746-7659, England, UK
8.
Zurück zum Zitat Mesleh, A.M., Kanaan, G.: Support vector machine text classification system: using ant colony optimization based feature subset selection. In: Proceeding of the International Conference on Computer Engineering & Systems, ICCES 2008, pp. 143–148 (2008) Mesleh, A.M., Kanaan, G.: Support vector machine text classification system: using ant colony optimization based feature subset selection. In: Proceeding of the International Conference on Computer Engineering & Systems, ICCES 2008, pp. 143–148 (2008)
9.
Zurück zum Zitat Hasan, K.S., Ng, V.: Conundrums in unsupervised keyphrases extraction: making sense of the state of the art. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Poster Volume (2010) Hasan, K.S., Ng, V.: Conundrums in unsupervised keyphrases extraction: making sense of the state of the art. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Poster Volume (2010)
10.
Zurück zum Zitat Mesleh, A.: Support vector machines based Arabic language text classification system : feature selection comparative study. In: Proceedings of the 12th WSEAS International Conference on Applied Mathematics, MATHq 2007, pp. 11–16. World Scientific and Engineering Academy and Society (WSEAS), Stevens Point, Wisconsin, USA (2007)CrossRef Mesleh, A.: Support vector machines based Arabic language text classification system : feature selection comparative study. In: Proceedings of the 12th WSEAS International Conference on Applied Mathematics, MATHq 2007, pp. 11–16. World Scientific and Engineering Academy and Society (WSEAS), Stevens Point, Wisconsin, USA (2007)CrossRef
11.
Zurück zum Zitat Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR 1999 Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57 (1999) Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR 1999 Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57 (1999)
12.
Zurück zum Zitat Yang, Y., Chute, G.C.: An example-based mapping method for text categorization and retrieval. ACM Trans. Inf. Syst. 12(3), 252–277 (1994)CrossRef Yang, Y., Chute, G.C.: An example-based mapping method for text categorization and retrieval. ACM Trans. Inf. Syst. 12(3), 252–277 (1994)CrossRef
13.
Zurück zum Zitat Kanaan, G., Al-Shalabi, R., AL-Akhras, A.: KNN Arabic text categorization using IG feature selection. In: Proceedings of The 4th International Multiconference on Computer Science and Information Technology, CSIT 2006, vol. 4 (2006) Kanaan, G., Al-Shalabi, R., AL-Akhras, A.: KNN Arabic text categorization using IG feature selection. In: Proceedings of The 4th International Multiconference on Computer Science and Information Technology, CSIT 2006, vol. 4 (2006)
14.
Zurück zum Zitat Li, H.Y., Jain, K.A.: Classification of text documents. Comput. J. 41(8), 537–546 (1998)CrossRef Li, H.Y., Jain, K.A.: Classification of text documents. Comput. J. 41(8), 537–546 (1998)CrossRef
15.
Zurück zum Zitat El-Halees, A.M.: Arabic text classification using maximum entropy. Islam. Univ. J. (Ser. Nat. Stud. Eng.) 15(1), 157–167 (2007) El-Halees, A.M.: Arabic text classification using maximum entropy. Islam. Univ. J. (Ser. Nat. Stud. Eng.) 15(1), 157–167 (2007)
16.
Zurück zum Zitat Duwairi, R.M.: A distance-based classifier for Arabic text categorization. In: Proceedings of The 2005 International Conference on Data Mining, DMIN 2005, pp. 187–192. CSREA Press (2005) Duwairi, R.M.: A distance-based classifier for Arabic text categorization. In: Proceedings of The 2005 International Conference on Data Mining, DMIN 2005, pp. 187–192. CSREA Press (2005)
17.
Zurück zum Zitat Khreisat, L.: Arabic text classification using N-gram frequency statistics a comparative study. In: Proceedings of The 2006 International Conference on Data Mining, DMIN 2006, pp. 78–82. CSREA Press (2006) Khreisat, L.: Arabic text classification using N-gram frequency statistics a comparative study. In: Proceedings of The 2006 International Conference on Data Mining, DMIN 2006, pp. 78–82. CSREA Press (2006)
18.
Zurück zum Zitat Benkhalifa, M.A., Mouradi, A., Bouyakhf, H.: Integrating WordNet knowledge to supplement training data in semi-supervised agglomerative hierarchical clustering for text categorization. Int. J. Intell. Syst. 16(8), 929–947 (2001)CrossRef Benkhalifa, M.A., Mouradi, A., Bouyakhf, H.: Integrating WordNet knowledge to supplement training data in semi-supervised agglomerative hierarchical clustering for text categorization. Int. J. Intell. Syst. 16(8), 929–947 (2001)CrossRef
19.
Zurück zum Zitat Motasem, A., Joseph, D.: « Levée d’ambigüité par la méthode d’exploration contextuelle: la séquence’alif-nûn (نا) en arabe » , In: Ghenima, M., Ouksel, A., Sidhom, S. (eds.) Systèmes d’Information et Intelligence Economique, 2ème Conférence Internationale (SIIE 2009), organisée par l’université de Nancy, France et l’université de la Manouba, École supérieure de commerce électronique (ESCE), Tunis, Tunis, Hammamet, 12–14 février 2009, IHE éditions, pp. 573–585 (2009) Motasem, A., Joseph, D.: « Levée d’ambigüité par la méthode d’exploration contextuelle: la séquence’alif-nûn (نا) en arabe » , In: Ghenima, M., Ouksel, A., Sidhom, S. (eds.) Systèmes d’Information et Intelligence Economique, 2ème Conférence Internationale (SIIE 2009), organisée par l’université de Nancy, France et l’université de la Manouba, École supérieure de commerce électronique (ESCE), Tunis, Tunis, Hammamet, 12–14 février 2009, IHE éditions, pp. 573–585 (2009)
20.
Zurück zum Zitat Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. In: Proceedings of EMNLP, pp. 404–411 (2004) Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. In: Proceedings of EMNLP, pp. 404–411 (2004)
21.
Zurück zum Zitat Page, L., Brin, L., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project (1998) Page, L., Brin, L., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project (1998)
22.
Zurück zum Zitat Al-Shalabi, R., Kanaan, G., Gharaibeh, M.: Arabic text categorization using kNN algorithm. In: Proceedings of the 6th International Conference on Advanced Information Management and Service, IMS 2010. Institute of Electrical and Electronics Engineers (IEEE) (2010) Al-Shalabi, R., Kanaan, G., Gharaibeh, M.: Arabic text categorization using kNN algorithm. In: Proceedings of the 6th International Conference on Advanced Information Management and Service, IMS 2010. Institute of Electrical and Electronics Engineers (IEEE) (2010)
Metadaten
Titel
Graph-Based Text Modeling: Considering Mathematical Semantic Linking to Improve the Indexation of Arabic Documents
verfasst von
Mohamed Salim El Bazzi
Driss Mammass
Taher Zaki
Abdelatif Ennaji
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-94211-7_16