Skip to main content
Erschienen in: International Journal of Speech Technology 2/2014

01.06.2014

A semantic parsing approach for Bhutanese language of Dzongkha

verfasst von: P. V. Arun

Erschienen in: International Journal of Speech Technology | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Developments in the computational analysis of Dzongkha have been limited due to the syntactic complexity of the language. Though the natural language processing domains have witnessed rapid developments over the past decade; very few works has been done in Dzongkha despite of being the national language of Bhutan. In this paper, we have investigated the major problems in Dzongkha processing and have proposed a semantic parsing approach for effective processing of this language. We have used a probabilistic approach and have used the linguistic rules in Dzongkha to remove the ambiguities. Semantic representations along with belief net concepts have been used to increase the accuracy of segmentation, syntactic and semantic analyses. The proposed frame work has been able to solve the major issues related to Dzongkha processing, however needs to be further improved to include all the syntactic variations.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abbasi, A. M., & Hussain, S. (2012). Syllable structure and syllabification in Sindhi-English loanwords. International Researchers, 1(4), 92–98. Abbasi, A. M., & Hussain, S. (2012). Syllable structure and syllabification in Sindhi-English loanwords. International Researchers, 1(4), 92–98.
Zurück zum Zitat Arun, P. V., & Sindhu, L. (2010). A probabilistic parser for Malayalam. In ACM transactions of speech & language processing, Kuala Lumpur, Malaysia, November 2–5 (pp. 17–21). Arun, P. V., & Sindhu, L. (2010). A probabilistic parser for Malayalam. In ACM transactions of speech & language processing, Kuala Lumpur, Malaysia, November 2–5 (pp. 17–21).
Zurück zum Zitat Chungku, C., Jurmey, R., & Gertrud, F. (2010). Building NLP resources for Dzongkha: a tagset and a tagged corpus. In Proceedings of the 8th workshop on Asian language resources, August 21–22, 2010. Beijing, China (pp. 103–110). Chungku, C., Jurmey, R., & Gertrud, F. (2010). Building NLP resources for Dzongkha: a tagset and a tagged corpus. In Proceedings of the 8th workshop on Asian language resources, August 21–22, 2010. Beijing, China (pp. 103–110).
Zurück zum Zitat Danescu, N. M. C., Sudhof, M., Jurafsky, D., Leskovec, J., & Potts, C. (2013). A computational approach to politeness with application to social factors. In Proceedings of ACL 2013, association for computational linguistics, Bulgaria, January 23–25 (pp. 311–321) Danescu, N. M. C., Sudhof, M., Jurafsky, D., Leskovec, J., & Potts, C. (2013). A computational approach to politeness with application to social factors. In Proceedings of ACL 2013, association for computational linguistics, Bulgaria, January 23–25 (pp. 311–321)
Zurück zum Zitat Fellbaum, C. (2007). WordNet: an electronic lexical database. Cambridge: MIT Press. Fellbaum, C. (2007). WordNet: an electronic lexical database. Cambridge: MIT Press.
Zurück zum Zitat Garrett, E., & Bateman, L. (2007). Symbiosis between documentary linguistics and linguistic pragmatics. In Proceedings of conference on language documentation and linguistic theory, May 3–5 (pp. 83–93). London: SOAS. Garrett, E., & Bateman, L. (2007). Symbiosis between documentary linguistics and linguistic pragmatics. In Proceedings of conference on language documentation and linguistic theory, May 3–5 (pp. 83–93). London: SOAS.
Zurück zum Zitat Green, S., Cer, D., Reschke, K., Sida, R. V., Silveira, N., Neidert, J., & Manning, C. D. (2013). Feature-rich phrase-based translation: Stanford university’s submission to the WMT 2013 translation task. In Proceedings of the eighth workshop on statistical machine translation, Bulgaria, August 1–3 (pp. 148–153). Green, S., Cer, D., Reschke, K., Sida, R. V., Silveira, N., Neidert, J., & Manning, C. D. (2013). Feature-rich phrase-based translation: Stanford university’s submission to the WMT 2013 translation task. In Proceedings of the eighth workshop on statistical machine translation, Bulgaria, August 1–3 (pp. 148–153).
Zurück zum Zitat Hackett, P. G. (2003). Tibetan Verb lexicon (pp. 120–123). Boulder: Snow Lion Publications. Hackett, P. G. (2003). Tibetan Verb lexicon (pp. 120–123). Boulder: Snow Lion Publications.
Zurück zum Zitat Huidan, L. (2012). Building large scale text corpus for Tibetan natural language processing by extracting text from web pages. In Proceedings of the 10th workshop on Asian language resources, COLING 2012, Mumbai, December 2–8 (pp. 11–20). Huidan, L. (2012). Building large scale text corpus for Tibetan natural language processing by extracting text from web pages. In Proceedings of the 10th workshop on Asian language resources, COLING 2012, Mumbai, December 2–8 (pp. 11–20).
Zurück zum Zitat Huidan, L., Nuo, M., Ma, L., Wu, J., & He, Y. (2011). Tibetan word segmentation as syllable tagging using conditional random field. In 25th pacific Asia conference on language, information and computation, China, March 17–20 (pp. 168–177). Huidan, L., Nuo, M., Ma, L., Wu, J., & He, Y. (2011). Tibetan word segmentation as syllable tagging using conditional random field. In 25th pacific Asia conference on language, information and computation, China, March 17–20 (pp. 168–177).
Zurück zum Zitat Irtza, S., & Hussain, S. (2013). Minimally balanced corpus for speech recognition. In Proceedings of 1st IEEE international conference on communications, signal processing, and their applications (ICCSPA’13), Sharjah, January 3–10 (pp. 70–78). Irtza, S., & Hussain, S. (2013). Minimally balanced corpus for speech recognition. In Proceedings of 1st IEEE international conference on communications, signal processing, and their applications (ICCSPA’13), Sharjah, January 3–10 (pp. 70–78).
Zurück zum Zitat Jiang, T., Yu, H., & Jam, Y. (2011). Tibetan word segmentation system based on conditional random fields. In 2011 IEEE 2nd International Conference on Software Engineering and Service Science (ICSESS), July 15–17 (pp. 446–448). CrossRef Jiang, T., Yu, H., & Jam, Y. (2011). Tibetan word segmentation system based on conditional random fields. In 2011 IEEE 2nd International Conference on Software Engineering and Service Science (ICSESS), July 15–17 (pp. 446–448). CrossRef
Zurück zum Zitat Noor, N. M. M., Ali, N. H., & Ibrahim, N. S. (2010). A new framework to extract WordNet lexicographer files for semi-formal notation: a preliminary study. In International symposium information technology (ITSim), June 15–17 (Vol. 2, pp. 1027–1031). Noor, N. M. M., Ali, N. H., & Ibrahim, N. S. (2010). A new framework to extract WordNet lexicographer files for semi-formal notation: a preliminary study. In International symposium information technology (ITSim), June 15–17 (Vol. 2, pp. 1027–1031).
Zurück zum Zitat Norbu, S., Choejey, P., Dendup, T., Hussain, S., & Muaz, A. (2010). Dzongkha word segmentation. In Proceedings of the 8th workshop on Asian language resources, COLING 2010, Beijing, China, April 3–8 (pp. 200–209). Norbu, S., Choejey, P., Dendup, T., Hussain, S., & Muaz, A. (2010). Dzongkha word segmentation. In Proceedings of the 8th workshop on Asian language resources, COLING 2010, Beijing, China, April 3–8 (pp. 200–209).
Zurück zum Zitat Poprat, M., Beisswanger, E., & Hahn, U. (2008). Building a bio-WordNet using WordNet data structures and WordNet’s software infrastructure—a failure story. In ACL 2008 workshop on software engineering, testing, and quality assurance for natural language processing, February 20–25, 2008 (pp. 31–39). Poprat, M., Beisswanger, E., & Hahn, U. (2008). Building a bio-WordNet using WordNet data structures and WordNet’s software infrastructure—a failure story. In ACL 2008 workshop on software engineering, testing, and quality assurance for natural language processing, February 20–25, 2008 (pp. 31–39).
Zurück zum Zitat Qiu, L., Long, C., & Zhao, X. (2012). A joint approach for building a large Tibetan corpus with syntactic parsing and semantic role labeling. In 2012 fifth international conference on intelligent networks and intelligent systems, Tianjin, China, November 1–3 (pp. 212–218). Qiu, L., Long, C., & Zhao, X. (2012). A joint approach for building a large Tibetan corpus with syntactic parsing and semantic role labeling. In 2012 fifth international conference on intelligent networks and intelligent systems, Tianjin, China, November 1–3 (pp. 212–218).
Zurück zum Zitat Socher, R., Huval, B., Manning, C. D., & Ng, A. Y. (2012). Semantic compositionality through recursive matrix-vector spaces. In EMNLP. Socher, R., Huval, B., Manning, C. D., & Ng, A. Y. (2012). Semantic compositionality through recursive matrix-vector spaces. In EMNLP.
Zurück zum Zitat Taskar, B., Klein, D., Collins, M., Koller, D., & Manning, C. (2004). Max-margin parsing. In Proceedings of EMNLP (pp. 1–8). Taskar, B., Klein, D., Collins, M., Koller, D., & Manning, C. (2004). Max-margin parsing. In Proceedings of EMNLP (pp. 1–8).
Metadaten
Titel
A semantic parsing approach for Bhutanese language of Dzongkha
verfasst von
P. V. Arun
Publikationsdatum
01.06.2014
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 2/2014
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-013-9218-0

Weitere Artikel der Ausgabe 2/2014

International Journal of Speech Technology 2/2014 Zur Ausgabe