Skip to main content

2016 | OriginalPaper | Buchkapitel

Part-of-Speech Tagging of Hindi Corpus Using Rule-Based Method

verfasst von : Deepa Modi, Neeta Nain

Erschienen in: Proceedings of the International Conference on Recent Cognizance in Wireless Communication & Image Processing

Verlag: Springer India

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The main goal of analysis of NLP (natural language processing) is to understand natural languages by parsing them. In the practice of analyzing natural languages there exist various sub-tasks. These sub-tasks depend on inbuilt structure of language and do not require complete knowledge and understanding of language. Part-of-speech tagging is one of them. Part-of-speech tagging is basically a practice of assigning language-specific grammatical tags to each word of language-specific input text, according to word’s appearance in the text. These tags can be like noun, adverb, number, negative, etc. There exist a variety of taggers for most popular language in the world, i.e., English. But such taggers cannot be used for morphologically rich Hindi language as difference exists between structures of both languages. A “Rule-based system” is presented in this paper. 29 standard part-of-speech tags are used, including two special tags for date and time also in multiple formats. The special tags like punctuation, time, and date are based on regular expressions. Main aim of the proposed system is to increase automaticity and maintain high precision, while limiting the size of human made corpus. Proposed system uses human made corpus of around 9,000 words to increase tagging and rule-based (lexical features based) approach to decrease the size of already trained corpus. The system yields 91.84 % of average precision and 85.45 % of average accuracy.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Brill, E.: A simple rule-based part of speech tagger. In: Proceedings of the Third Conference on Applied Natural Language Processing ANLC ’92, Stroudsburg, PA, USA, pp. 152–155 (1992) Brill, E.: A simple rule-based part of speech tagger. In: Proceedings of the Third Conference on Applied Natural Language Processing ANLC ’92, Stroudsburg, PA, USA, pp. 152–155 (1992)
2.
Zurück zum Zitat Zin, K.K., Thein, N.L.: Part of speech tagging for Myanmar using hidden markov model. In: Proceedings of International Conference on the Current Trends in Information Technology (CTIT), Dubai, Dec 2009, pp. 1–6 (2009) Zin, K.K., Thein, N.L.: Part of speech tagging for Myanmar using hidden markov model. In: Proceedings of International Conference on the Current Trends in Information Technology (CTIT), Dubai, Dec 2009, pp. 1–6 (2009)
3.
Zurück zum Zitat Bharati, A., Sharma, D.M., and Sangal, R.: AnnCorra: An Introduction (Vol. 14), Technical Report no: TR-LTRC (2001) Bharati, A., Sharma, D.M., and Sangal, R.: AnnCorra: An Introduction (Vol. 14), Technical Report no: TR-LTRC (2001)
4.
Zurück zum Zitat Mishra, N., Mishra, A.: Part of speech tagging for Hindi corpus. In: Proceedings of the International Conference on Communication Systems and Network Technologies (CSNT), Katra, Jammu, India, June 2011, pp. 554–558 (2011) Mishra, N., Mishra, A.: Part of speech tagging for Hindi corpus. In: Proceedings of the International Conference on Communication Systems and Network Technologies (CSNT), Katra, Jammu, India, June 2011, pp. 554–558 (2011)
5.
Zurück zum Zitat Garg, N., Goyal, V., Preet, S.: Rule based Hindi part of speech tagger. In: Proceedings of Coling, Mumbai, India (2012) Garg, N., Goyal, V., Preet, S.: Rule based Hindi part of speech tagger. In: Proceedings of Coling, Mumbai, India (2012)
6.
Zurück zum Zitat Singh, S., Gupta, K., Shrivastava, M., Bhattacharyya, P.: Morphological richness offsets resource poverty—an experience in building a POS tagger for Hindi. In: Proceedings of Coling, Sydney, Australia (2006) Singh, S., Gupta, K., Shrivastava, M., Bhattacharyya, P.: Morphological richness offsets resource poverty—an experience in building a POS tagger for Hindi. In: Proceedings of Coling, Sydney, Australia (2006)
7.
Zurück zum Zitat Dalal, A., Nagaraj, K., Sawant, U., Shelke, S., Bhattacharyya, P.: Building feature rich POS tagger for morphologically rich languages: experiences in Hindi. In: Proceedings of ICON (2007) Dalal, A., Nagaraj, K., Sawant, U., Shelke, S., Bhattacharyya, P.: Building feature rich POS tagger for morphologically rich languages: experiences in Hindi. In: Proceedings of ICON (2007)
8.
Zurück zum Zitat A part of speech tagger for Indian languages (pos tagger) (2007) A part of speech tagger for Indian languages (pos tagger) (2007)
9.
Zurück zum Zitat Fayyad, U.M., Shapiro, G., Smyth, P., Uthurusamy, R.: Advances in Knowledge Discovery and Data Mining, American Association for Artificial Intelligence, Menlo Park, CA, USA (1996) Fayyad, U.M., Shapiro, G., Smyth, P., Uthurusamy, R.: Advances in Knowledge Discovery and Data Mining, American Association for Artificial Intelligence, Menlo Park, CA, USA (1996)
Metadaten
Titel
Part-of-Speech Tagging of Hindi Corpus Using Rule-Based Method
verfasst von
Deepa Modi
Neeta Nain
Copyright-Jahr
2016
Verlag
Springer India
DOI
https://doi.org/10.1007/978-81-322-2638-3_28

Neuer Inhalt