Skip to main content
Erschienen in:
Buchtitelbild

2016 | OriginalPaper | Buchkapitel

POS Word Class Based Categorization of Gurmukhi Language Stemmed Stop Words

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Literature in Indian language must be classified for its easy retrieval. In Punjabi literature classifier, five different categories: nature, romantic, religious, patriotic and philosophical, are manually populated with 250 poems. These poems are pre-processed through data cleaning, tokenization, bag of word, stop word identification and stemming phases. Due to unavailability of Punjabi stop words in public domain, manual collection of 256 stop words are done from poetry and articles. After stemming, 184 unique stemmed words are identified. Based on part of speech tagging, 184 stop words are categorized into 98 adverbs, 7 conjunctions, 43 verbs, 24 pronouns and 12 miscellaneous words. These unique 184 stemmed words are being released for other language processing algorithm in Punjabi. This paper concentrates on providing better and deeper understanding of Punjabi stop words in lieu of Punjabi grammar and part of speech based word class categorization.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
5.
Zurück zum Zitat Kaur, J., Saini, J.R.: A study and analysis of opinion mining research in Indo-Aryan, Dravidian and Tibeto-Burman Language families. Int. J. Data Mining Emerg. 4(2), 53–60 (2014)CrossRef Kaur, J., Saini, J.R.: A study and analysis of opinion mining research in Indo-Aryan, Dravidian and Tibeto-Burman Language families. Int. J. Data Mining Emerg. 4(2), 53–60 (2014)CrossRef
6.
Zurück zum Zitat Ali, R.A., Maliha, I.: Urdu text classification. In: 7th International Conference on Frontiers of Information Technology, ACM New York, USA, (2009). ISBN 978-1-60558-642-7, doi:10.1145/1838002.1838025 Ali, R.A., Maliha, I.: Urdu text classification. In: 7th International Conference on Frontiers of Information Technology, ACM New York, USA, (2009). ISBN 978-1-60558-642-7, doi:10.​1145/​1838002.​1838025
7.
Zurück zum Zitat Mansur, M., UzZaman, N., Khan, M.: Analysis of N-Gram Based Text Categorization for Bangla in a Newspaper Corpus. Center for Research on Bangla Language Processing. BRAC University, Dhaka, Bangladesh (2006) Mansur, M., UzZaman, N., Khan, M.: Analysis of N-Gram Based Text Categorization for Bangla in a Newspaper Corpus. Center for Research on Bangla Language Processing. BRAC University, Dhaka, Bangladesh (2006)
8.
Zurück zum Zitat Mohanty, S., Santi, P.K., Mishra, R., Mohapatra, R.N., Swain, S.: Semantic based text classification using wordnets: Indian language perspective. In: 3rd International Wordnet Conference (GWC 06). pp. 321–324 (2006). doi:10.1.1.134.866 Mohanty, S., Santi, P.K., Mishra, R., Mohapatra, R.N., Swain, S.: Semantic based text classification using wordnets: Indian language perspective. In: 3rd International Wordnet Conference (GWC 06). pp. 321–324 (2006). doi:10.​1.​1.​134.​866
9.
Zurück zum Zitat Nidhi., Gupta, V.: Domain based classification Punjabi text documents. In: International Conference on Computational Linguistics, pp. 297–304 (2012) Nidhi., Gupta, V.: Domain based classification Punjabi text documents. In: International Conference on Computational Linguistics, pp. 297–304 (2012)
10.
Zurück zum Zitat Sarmah, J., Saharia, N., Sarma, S.K.: A novel approach for document classification using assamese wordnet. In: 6th International Global Wordnet Conference, pp. 324–329 (2012) Sarmah, J., Saharia, N., Sarma, S.K.: A novel approach for document classification using assamese wordnet. In: 6th International Global Wordnet Conference, pp. 324–329 (2012)
11.
Zurück zum Zitat Murthy, K.N.: Automatic Categorization of Telugu News Articles. Department of Computer and Information Sciences, University of Hyderabad, Hyderabad (2003). doi:202.41.85.68 Murthy, K.N.: Automatic Categorization of Telugu News Articles. Department of Computer and Information Sciences, University of Hyderabad, Hyderabad (2003). doi:202.​41.​85.​68
12.
Zurück zum Zitat Rajan, K., Ramalingam, V., Ganesan, M., Palanive, S., Palaniappan, B.: Automatic classification of Tamil documents using vector space model and artificial neural network. Expert Syst. Appl. 36(8), 10914–10918 (2009)CrossRef Rajan, K., Ramalingam, V., Ganesan, M., Palanive, S., Palaniappan, B.: Automatic classification of Tamil documents using vector space model and artificial neural network. Expert Syst. Appl. 36(8), 10914–10918 (2009)CrossRef
13.
Zurück zum Zitat Jayashree, R.: An analysis of sentence level text classification for the Kannada language. In: International Conference of Soft Computing and Pattern Recognition, pp. 147–151 (2011) Jayashree, R.: An analysis of sentence level text classification for the Kannada language. In: International Conference of Soft Computing and Pattern Recognition, pp. 147–151 (2011)
14.
Zurück zum Zitat Gupta, V., Lehal, G.S.: Preprocessing phase of Punjabi language text summarization. In: International Conference on Information System for Indian languages, vol. 139, pp. 250–253(2011) Gupta, V., Lehal, G.S.: Preprocessing phase of Punjabi language text summarization. In: International Conference on Information System for Indian languages, vol. 139, pp. 250–253(2011)
17.
Zurück zum Zitat Gupta, V.: Automatic stemming of words for Punjabi language. In: Advances in Signal Processing and Intelligent Recognition systems, Advances in Intelligent Systems and Computing, vol. 264, pp. 73–84 (2014) Gupta, V.: Automatic stemming of words for Punjabi language. In: Advances in Signal Processing and Intelligent Recognition systems, Advances in Intelligent Systems and Computing, vol. 264, pp. 73–84 (2014)
20.
Zurück zum Zitat Bhatia, T.K.: Punjabi: a cognitive-descriptive grammar. Rout ledge Descriptive Grammar Series (1993) Bhatia, T.K.: Punjabi: a cognitive-descriptive grammar. Rout ledge Descriptive Grammar Series (1993)
Metadaten
Titel
POS Word Class Based Categorization of Gurmukhi Language Stemmed Stop Words
verfasst von
Kaur Jasleen
R. Saini Jatinderkumar
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-30927-9_1