Skip to main content
Erschienen in: Soft Computing 24/2019

17.08.2019 | Foundations

RNN-LSTM-GRU based language transformation

verfasst von: Ahmed Khan, Aaliya Sarfaraz

Erschienen in: Soft Computing | Ausgabe 24/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In past, rule-based and statistical machine translation techniques were employed to solve Urdu transliteration techniques. As mentioned in the literature, Urdu is considered as low-resource language. An impressive effort has been made for Arabic, French and Chinese language transliteration as compared to the Urdu language. Machine translation of Urdu language is a challenging problem. A very minute research work has been conducted toward Urdu transliteration. Factors behind the ignorance of Urdu language in research may be for its morphological complexity, diversity and most importantly due to the lack of reasonable bilingual parallel dataset. Getting a corpus for a language transliteration is the main resource to work on. This paper demonstrates the application of neural machine translation (NMT) for Urdu language transliteration, with the emphasis on contextual coverage of a language, which helps to improve transliteration accuracy. Build a robust NMT model which delivers efficient performance when trained over bilingual parallel corpora. Neural machine translation is an emerging technique depicting impressive performance, better than traditional MT methods in multiple aspects. In this research, we build the NMT model for the Urdu language to improve transliteration quality. An attention-based encoder–decoder system is proposed, and our experiment proves the efficiency of the proposed approach. To the best of our knowledge, this is the first effort for Urdu language bidirectional transliteration toward neural machine translation.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat AbdulJaleel N, Larkey L (2003) English to arabic transliteration for information retrieval: a statistical approach. Center for Intelligent Information Retrieval Computer Science, University of Massachusetts AbdulJaleel N, Larkey L (2003) English to arabic transliteration for information retrieval: a statistical approach. Center for Intelligent Information Retrieval Computer Science, University of Massachusetts
Zurück zum Zitat Ahmed T (2009) Roman to urdu transliteration using wordlist. In: Proceedings of conference on language and technology, pp 305–309 Ahmed T (2009) Roman to urdu transliteration using wordlist. In: Proceedings of conference on language and technology, pp 305–309
Zurück zum Zitat Akram QUA, Hussain S (2019) Improving Urdu recognition using character-based artistic features of nastalique calligraphy. IEEE Access 7:8495–8507CrossRef Akram QUA, Hussain S (2019) Improving Urdu recognition using character-based artistic features of nastalique calligraphy. IEEE Access 7:8495–8507CrossRef
Zurück zum Zitat Alam M, ul Hussain S (2017) Sequence to sequence networks for roman-urdu to urdu transliteration. In: Multi-topic conference (INMIC), international. IEEE, pp 1–7 Alam M, ul Hussain S (2017) Sequence to sequence networks for roman-urdu to urdu transliteration. In: Multi-topic conference (INMIC), international. IEEE, pp 1–7
Zurück zum Zitat Ameur MSH, Meziane F, Guessoum A (2017) Arabic machine transliteration using an attention-based encoder–decoder model. Proc Comput Sci 117:287–297CrossRef Ameur MSH, Meziane F, Guessoum A (2017) Arabic machine transliteration using an attention-based encoder–decoder model. Proc Comput Sci 117:287–297CrossRef
Zurück zum Zitat Anwar W, Bajwa IS, Choudhary MA, Ramzan S (2019) An empirical study on forensic analysis of Urdu text using LDA-based authorship attribution. IEEE Access 7:3224–3234CrossRef Anwar W, Bajwa IS, Choudhary MA, Ramzan S (2019) An empirical study on forensic analysis of Urdu text using LDA-based authorship attribution. IEEE Access 7:3224–3234CrossRef
Zurück zum Zitat Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv:1409.0473 Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv:​1409.​0473
Zurück zum Zitat Bögel T (2012) Urdu-roman transliteration via finite state transducers. In: FSMNLP 2012, 10th international workshop on finite state methods and natural language processing, pp 25–29 Bögel T (2012) Urdu-roman transliteration via finite state transducers. In: FSMNLP 2012, 10th international workshop on finite state methods and natural language processing, pp 25–29
Zurück zum Zitat Chen N, Banchs RE, Zhang M, Duan X, Li H (2018) Report of news 2018 named entity transliteration shared task. In: Proceedings of the seventh named entities workshop, pp 55–73 Chen N, Banchs RE, Zhang M, Duan X, Li H (2018) Report of news 2018 named entity transliteration shared task. In: Proceedings of the seventh named entities workshop, pp 55–73
Zurück zum Zitat Daud A, Khan W, Che D (2017) Urdu language processing: a survey. Artif Intell Rev 47(3):279–311CrossRef Daud A, Khan W, Che D (2017) Urdu language processing: a survey. Artif Intell Rev 47(3):279–311CrossRef
Zurück zum Zitat Deng L (2011) An overview of deep-structured learning for information processing Deng L (2011) An overview of deep-structured learning for information processing
Zurück zum Zitat Deselaers T, Hasan S, Bender O, Ney H (2009) A deep learning approach to machine transliteration. In: Proceedings of the fourth workshop on statistical machine translation. Association for Computational Linguistics, pp 233–241 Deselaers T, Hasan S, Bender O, Ney H (2009) A deep learning approach to machine transliteration. In: Proceedings of the fourth workshop on statistical machine translation. Association for Computational Linguistics, pp 233–241
Zurück zum Zitat Durrani N, Sajjad H, Fraser A, Schmid H (2010) Hindi-to-urdu machine translation through transliteration. In: Proceedings of the 48th annual meeting of the association for computational linguistics, pp 465–474 Durrani N, Sajjad H, Fraser A, Schmid H (2010) Hindi-to-urdu machine translation through transliteration. In: Proceedings of the 48th annual meeting of the association for computational linguistics, pp 465–474
Zurück zum Zitat Finch A, Liu L, Wang X, Sumita E (2015) Neural network transduction models in transliteration generation. In: Proceedings of the fifth named entity workshop, pp 61–66 Finch A, Liu L, Wang X, Sumita E (2015) Neural network transduction models in transliteration generation. In: Proceedings of the fifth named entity workshop, pp 61–66
Zurück zum Zitat Finch A, Liu L, Wang X, Sumita E (2016) Target-bidirectional neural models for machine transliteration. In: Proceedings of the sixth named entity workshop, pp 78–82 Finch A, Liu L, Wang X, Sumita E (2016) Target-bidirectional neural models for machine transliteration. In: Proceedings of the sixth named entity workshop, pp 78–82
Zurück zum Zitat Habash N (2008) Four techniques for online handling of out-of-vocabulary words in arabic-english statistical machine translation. In: Proceedings of 46th annual meeting of the association for computational linguistics on human language technologies, pp 57–60 Habash N (2008) Four techniques for online handling of out-of-vocabulary words in arabic-english statistical machine translation. In: Proceedings of 46th annual meeting of the association for computational linguistics on human language technologies, pp 57–60
Zurück zum Zitat He W, He Z, Wu H, Wang H (2016) Improved neural machine translation with smt features. In: AAAI, pp 151–157 He W, He Z, Wu H, Wang H (2016) Improved neural machine translation with smt features. In: AAAI, pp 151–157
Zurück zum Zitat Hermjakob U, Knight K, Daumé H III (2008) Name translation in statistical machine translation-learning when to transliterate. In: Proceedings of ACL-08: HLT, pp 389–397 Hermjakob U, Knight K, Daumé H III (2008) Name translation in statistical machine translation-learning when to transliterate. In: Proceedings of ACL-08: HLT, pp 389–397
Zurück zum Zitat Kang I-H, Kim G (2000) English-to-korean transliteration using multiple unbounded overlapping phoneme chunks. In: Proceedings of the 18th conference on computational linguistics, vol 1, pp 418–424 Kang I-H, Kim G (2000) English-to-korean transliteration using multiple unbounded overlapping phoneme chunks. In: Proceedings of the 18th conference on computational linguistics, vol 1, pp 418–424
Zurück zum Zitat Karakanta A, Dehdari J, van Genabith J (2018) Neural machine translation for low-resource languages without parallel corpora, Machine Translation, pp 1–23 Karakanta A, Dehdari J, van Genabith J (2018) Neural machine translation for low-resource languages without parallel corpora, Machine Translation, pp 1–23
Zurück zum Zitat Khan A (2014) Joint ownership verification for digital text. Adv Comput Sci Appl 3(4):525–531 Khan A (2014) Joint ownership verification for digital text. Adv Comput Sci Appl 3(4):525–531
Zurück zum Zitat Khan A (2015a) Comparative analysis of watermarking techniques. Sci Int 27(6):6091–6096 Khan A (2015a) Comparative analysis of watermarking techniques. Sci Int 27(6):6091–6096
Zurück zum Zitat Khan A (2015b) Robust textual steganography. J Sci 4(4):426–434 Khan A (2015b) Robust textual steganography. J Sci 4(4):426–434
Zurück zum Zitat Khan A, Sarfaraz A (2017) Vetting the security of mobile applications. Sci Int 29(2):361–365 Khan A, Sarfaraz A (2017) Vetting the security of mobile applications. Sci Int 29(2):361–365
Zurück zum Zitat Khan A, Sarfaraz A (2018) Practical guidelines for securing wireless local area networks (WLANs). Int J Secur Appl 12(3):19–28 Khan A, Sarfaraz A (2018) Practical guidelines for securing wireless local area networks (WLANs). Int J Secur Appl 12(3):19–28
Zurück zum Zitat Khan A, Sarfaraz A (2019a) Novel high-capacity robust and imperceptible image steganography scheme using multi flipped permutations and frequency entropy matching method. Soft Comput 23(17):8045–8056CrossRef Khan A, Sarfaraz A (2019a) Novel high-capacity robust and imperceptible image steganography scheme using multi flipped permutations and frequency entropy matching method. Soft Comput 23(17):8045–8056CrossRef
Zurück zum Zitat Khan A, Sohaib M et al (2016) High-capacity multi-layer framework for highly robust textual steganography. Sci Int 28(5):4451–4457 Khan A, Sohaib M et al (2016) High-capacity multi-layer framework for highly robust textual steganography. Sci Int 28(5):4451–4457
Zurück zum Zitat Khan W, Daud A, Khan K, Nasir JA, Basheri M, Aljohani N, Alotaibi FS (2019) Part of speech tagging in Urdu: comparison of machine and deep learning approaches. IEEE Access Khan W, Daud A, Khan K, Nasir JA, Basheri M, Aljohani N, Alotaibi FS (2019) Part of speech tagging in Urdu: comparison of machine and deep learning approaches. IEEE Access
Zurück zum Zitat Knight K, Graehl J (1998) Machine transliteration. Comput Linguist 24(4):599–612 Knight K, Graehl J (1998) Machine transliteration. Comput Linguist 24(4):599–612
Zurück zum Zitat Kundu S, Paul S, Pal S (2018) A deep learning based approach to transliteration. In: Proceedings of the seventh named entities workshop, pp 79–83 Kundu S, Paul S, Pal S (2018) A deep learning based approach to transliteration. In: Proceedings of the seventh named entities workshop, pp 79–83
Zurück zum Zitat Lee JS, Choi K-S (1998) English to korean statistical transliteration for information retrieval. Comput Process Oriental Lang 12(1):17–37 Lee JS, Choi K-S (1998) English to korean statistical transliteration for information retrieval. Comput Process Oriental Lang 12(1):17–37
Zurück zum Zitat Long Z, Utsuro T, Mitsuhashi T, Yamamoto M (2017) Translation of patent sentences with a large vocabulary of technical terms using neural machine translation, arXiv preprint arXiv:1704.04521 Long Z, Utsuro T, Mitsuhashi T, Yamamoto M (2017) Translation of patent sentences with a large vocabulary of technical terms using neural machine translation, arXiv preprint arXiv:​1704.​04521
Zurück zum Zitat Mehmood K, Essam D, Shafi K, Malik MK (2019) Discriminative feature spamming technique for roman Urdu sentiment analysis. IEEE Access 7:47991–48002CrossRef Mehmood K, Essam D, Shafi K, Malik MK (2019) Discriminative feature spamming technique for roman Urdu sentiment analysis. IEEE Access 7:47991–48002CrossRef
Zurück zum Zitat Noor R, Khan A, Sarfaraz A (2019b) High performance and energy efficient image watermarking for video using a mobile device. Wirel Pers Commun 104(4):1535–1551CrossRef Noor R, Khan A, Sarfaraz A (2019b) High performance and energy efficient image watermarking for video using a mobile device. Wirel Pers Commun 104(4):1535–1551CrossRef
Zurück zum Zitat Omar AM, Qu J, Yuenyong S (2016) Automatic transliteration of proper names from Somali to English. Sci Technol Asia 21(4):17–25 Omar AM, Qu J, Yuenyong S (2016) Automatic transliteration of proper names from Somali to English. Sci Technol Asia 21(4):17–25
Zurück zum Zitat Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 311–318 Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 311–318
Zurück zum Zitat Rabiner LR (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286CrossRef Rabiner LR (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286CrossRef
Zurück zum Zitat Rao K, Peng F, Sak H, Beaufays F (2015) Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks. In: IEEE international conference on acoustics, speech and signal processing, IEEE, pp 4225–4229 Rao K, Peng F, Sak H, Beaufays F (2015) Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks. In: IEEE international conference on acoustics, speech and signal processing, IEEE, pp 4225–4229
Zurück zum Zitat Sanjanaashree P et al. (2014) Joint layer based deep learning framework for bilingual machine transliteration. In: 2014 international conference on advances in computing, communications and informatics (ICACCI, IEEE, 2014, pp 1737–1743 Sanjanaashree P et al. (2014) Joint layer based deep learning framework for bilingual machine transliteration. In: 2014 international conference on advances in computing, communications and informatics (ICACCI, IEEE, 2014, pp 1737–1743
Zurück zum Zitat Sarfaraz A, Khan A (2018) Feature selection based correlation attack on HTTPS secure searching. Wirel Pers Commun 103(4):2995–3008CrossRef Sarfaraz A, Khan A (2018) Feature selection based correlation attack on HTTPS secure searching. Wirel Pers Commun 103(4):2995–3008CrossRef
Zurück zum Zitat Shao Y, Nivre J (2016) Applying neural networks to english-chinese named entity transliteration. In: Proceedings of the sixth named entity workshop, 2016, pp 73–77 Shao Y, Nivre J (2016) Applying neural networks to english-chinese named entity transliteration. In: Proceedings of the sixth named entity workshop, 2016, pp 73–77
Zurück zum Zitat Sharma VK, Mittal N (2018) Cross-lingual information retrieval: a dictionary-based query translation approach. In: Bhatia SK, Mishra KK, Tiwari S, Singh VK (eds) Advances in computer and computational sciences. Springer, Singapore, pp 611–618CrossRef Sharma VK, Mittal N (2018) Cross-lingual information retrieval: a dictionary-based query translation approach. In: Bhatia SK, Mishra KK, Tiwari S, Singh VK (eds) Advances in computer and computational sciences. Springer, Singapore, pp 611–618CrossRef
Zurück zum Zitat Shilpa K, Usha K (2016) Transliteration in malayalam using deep learning. Int J Adv Res Comput Commun Eng 5(1):157–160 Shilpa K, Usha K (2016) Transliteration in malayalam using deep learning. Int J Adv Res Comput Commun Eng 5(1):157–160
Zurück zum Zitat Wang Y-Y, Acero A, Chelba C (2003) Is word error rate a good indicator for spoken language understanding accuracy. In: 2003 IEEE workshop on automatic speech recognition and understanding, ASRU’03. IEEE, 2003, pp 577–582 Wang Y-Y, Acero A, Chelba C (2003) Is word error rate a good indicator for spoken language understanding accuracy. In: 2003 IEEE workshop on automatic speech recognition and understanding, ASRU’03. IEEE, 2003, pp 577–582
Zurück zum Zitat Wang W, Peter J-T, Rosendahl H, Ney H (2016) Character: translation edit rate on character level. In: Proceedings of the first conference on machine translation: volume 2, Shared Task Papers, vol 2, pp 505–510 Wang W, Peter J-T, Rosendahl H, Ney H (2016) Character: translation edit rate on character level. In: Proceedings of the first conference on machine translation: volume 2, Shared Task Papers, vol 2, pp 505–510
Zurück zum Zitat Zahid MA, Rao NI, Siddiqui AM (2010) English to urdu transliteration: an application of soundex algorithm. In: 2010 international conference on information and emerging technologies (ICIET), IEEE, pp 1–5 Zahid MA, Rao NI, Siddiqui AM (2010) English to urdu transliteration: an application of soundex algorithm. In: 2010 international conference on information and emerging technologies (ICIET), IEEE, pp 1–5
Zurück zum Zitat Zoph B, Yuret D, May J, Knight K (2016) Transfer learning for low-resource neural machine translation, arXiv preprint arXiv:1604.02201 Zoph B, Yuret D, May J, Knight K (2016) Transfer learning for low-resource neural machine translation, arXiv preprint arXiv:​1604.​02201
Metadaten
Titel
RNN-LSTM-GRU based language transformation
verfasst von
Ahmed Khan
Aaliya Sarfaraz
Publikationsdatum
17.08.2019
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 24/2019
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-019-04281-z

Weitere Artikel der Ausgabe 24/2019

Soft Computing 24/2019 Zur Ausgabe

Premium Partner