Skip to main content

2015 | OriginalPaper | Buchkapitel

Document Classification Using Enhanced Grid Based Clustering Algorithm

verfasst von : Mohamed Ahmed Rashad, Hesham El-Deeb, Mohamed Waleed Fakhr

Erschienen in: New Trends in Networking, Computing, E-learning, Systems Sciences, and Engineering

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Automated document clustering is an important text mining task especially with the rapid growth of the number of online documents present in Arabic language. Text clustering aims to automatically assign the text to a predefined cluster based on linguistic features. This research proposes an enhanced grid based clustering algorithm. The main purpose of this algorithm is to divide the data space into clusters with arbitrary shape. These clusters are considered as dense regions of points in the data space that are separated by regions of low density representing noise. Also it deals with making clustering the data set with multi-densities and assigning noise and outliers to the closest category. This will reduce the time complexity. Unclassified documents are preprocessed by removing stops words and extracting word root used to reduce the dimensionality of feature vectors of documents. Each document is then represented as a vector of words and their frequencies. The accuracy is presented according to time consumption and the percentage of successfully clustered instances. The results of the experiments that were carried out on an in-house collected Arabic text have proven its effectiveness of the enhanced clustering algorithm with average accuracy 89 %.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Raghuvira Pratap A, K Suvarna Vani, J Rama Devi, Dr.K Nageswara Rao, “An Efficient Density based Improved K- medoids Clustering algorithm”, (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 2, No. 6, 2011. Raghuvira Pratap A, K Suvarna Vani, J Rama Devi, Dr.K Nageswara Rao, “An Efficient Density based Improved K- medoids Clustering algorithm”, (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 2, No. 6, 2011.
2.
Zurück zum Zitat Dina Adel Said “Dimensionality reduction techniques for enhancing automatic text categorization”, 2007. Dina Adel Said “Dimensionality reduction techniques for enhancing automatic text categorization”, 2007.
3.
Zurück zum Zitat Priyanka Trikha and Singh Vijendra, “Fast Density Based Clustering Algorithm”, International Journal of Machine Learning and Computing, Vol. 3, No. 1, February 2013. Priyanka Trikha and Singh Vijendra, “Fast Density Based Clustering Algorithm”, International Journal of Machine Learning and Computing, Vol. 3, No. 1, February 2013.
4.
Zurück zum Zitat Li Jian; Yu Wei; Yan Bao-Ping; “Memory effect in DBSCAN algorithm,” Computer Science & Education, 2009. ICCSE ‘09. 4th International Conference on, vol., no., pp.31-36, 25-28 July 2009. Li Jian; Yu Wei; Yan Bao-Ping; “Memory effect in DBSCAN algorithm,” Computer Science & Education, 2009. ICCSE ‘09. 4th International Conference on, vol., no., pp.31-36, 25-28 July 2009.
5.
Zurück zum Zitat J. Hencil Peter, A. Antonysamy, “An Optimised Density Based Clustering Algorithm”, International Journal of Computer Applications (0975 – 8887) Volume 6– No.9, September 2010. J. Hencil Peter, A. Antonysamy, “An Optimised Density Based Clustering Algorithm”, International Journal of Computer Applications (0975 – 8887) Volume 6– No.9, September 2010.
6.
Zurück zum Zitat Anil Kumar, S.Chandrasekhar, “Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering”, International Journal of Engineering Research & Technology (IJERT) Vol. 1 Issue 5, July – 2012 ISSN: 2278-0181. Anil Kumar, S.Chandrasekhar, “Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering”, International Journal of Engineering Research & Technology (IJERT) Vol. 1 Issue 5, July – 2012 ISSN: 2278-0181.
7.
Zurück zum Zitat Osama A. Ghanem, Wesam M. Ashour, “Stemming Effectiveness in Clustering of Arabic Documents”, International Journal of Computer Applications (0975 – 8887) Volume 49– No.5, July 2012. Osama A. Ghanem, Wesam M. Ashour, “Stemming Effectiveness in Clustering of Arabic Documents”, International Journal of Computer Applications (0975 – 8887) Volume 49– No.5, July 2012.
8.
Zurück zum Zitat Motaz K. Saad, “The Impact of Text Preprocessing and Term Weighting on Arabic Text Classification”, September 2010. Motaz K. Saad, “The Impact of Text Preprocessing and Term Weighting on Arabic Text Classification”, September 2010.
9.
Zurück zum Zitat Al-Shalabi, R., Kanaan, G. and Al-Serhan H., “New approach for extracting Arabic roots”, The International Arab Conference on Information Technology (ACIT ‘2003), Alexandria, Egypt, December, 2003. Al-Shalabi, R., Kanaan, G. and Al-Serhan H., “New approach for extracting Arabic roots”, The International Arab Conference on Information Technology (ACIT ‘2003), Alexandria, Egypt, December, 2003.
10.
Zurück zum Zitat Mahmud S. Alkoffash, “Comparing between Arabic Text Clustering using K-means and K-mediods”, International Journal of Computer Applications (0975 – 8887) Volume 51– No.2, August 2012. Mahmud S. Alkoffash, “Comparing between Arabic Text Clustering using K-means and K-mediods”, International Journal of Computer Applications (0975 – 8887) Volume 51– No.2, August 2012.
Metadaten
Titel
Document Classification Using Enhanced Grid Based Clustering Algorithm
verfasst von
Mohamed Ahmed Rashad
Hesham El-Deeb
Mohamed Waleed Fakhr
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-06764-3_27

Neuer Inhalt