Skip to main content
Erschienen in: Neural Processing Letters 4/2022

15.02.2022

Simplification of English and Bengali Sentences for Improving Quality of Machine Translation

verfasst von: Sainik Kumar Mahata, Avishek Garain, Dipankar Das, Sivaji Bandyopadhyay

Erschienen in: Neural Processing Letters | Ausgabe 4/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Training translation systems with complex and compound sentences are generally considered computationally tough and such systems fail to process, the large syntactical information given out by these sentences. This issue subsequently, affects the overall quality of translations. On the other hand, simple sentences are shorter by nature and produce less syntactical information. Therefore, it would be safe to say that training translation systems using simple sentences only, would result in better translation output. However, training a translation system requires a large and quality parallel corpus involving two natural languages. While parallel corpus for various language pairs is abundant, such lexicons for low-resourced languages, consisting of only simple sentences are rare. In such a scenario, the development of such a parallel lexicon is the initial purpose of the present work. Building the same would require differentiating complex and/or compound sentences from the overall corpus and then converting them into simple sentences. Since, the work includes two languages, English and Bengali, different algorithms to accomplish the same, is documented in this paper. Converting complex and compound sentences to simple instances results in fragmenting sentences into two or more segments, which then needs to be aligned to make them semantically similar. Hence, a basic alignment technique has also been proposed to mitigate this problem. After developing the parallel corpus, we needed to check for its effectiveness in solving the quality issues of translation systems discussed earlier. For this, state-of-the-art translation modules like Statistical Machine Translation and Neural Machine Translation, have been trained using the developed corpus as well as with the raw parallel corpus consisting of sentences of mixed complexities. The performance of these translation models has been compared using automated as well as manual evaluation metrics. The results are promising and prove that translation systems do perform better when trained using simple sentence language pairs.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Bengio Y, LeCun Y (eds) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, arXiv:1409.0473 Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Bengio Y, LeCun Y (eds) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, arXiv:​1409.​0473
2.
Zurück zum Zitat Banerjee T, Kunchukuttan A, Bhattacharya P (2018) Multilingual Indian language translation system at WAT 2018: Many-to-one phrase-based SMT. In: Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation, Association for Computational Linguistics, Hong Kong, https://aclanthology.org/Y18-3013 Banerjee T, Kunchukuttan A, Bhattacharya P (2018) Multilingual Indian language translation system at WAT 2018: Many-to-one phrase-based SMT. In: Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation, Association for Computational Linguistics, Hong Kong, https://​aclanthology.​org/​Y18-3013
4.
Zurück zum Zitat Chung J, Cho K, Bengio Y (2016) A character-level decoder without explicit segmentation for neural machine translation. CoRR abs/1603.06147, arXiv:1603.06147 Chung J, Cho K, Bengio Y (2016) A character-level decoder without explicit segmentation for neural machine translation. CoRR abs/1603.06147, arXiv:​1603.​06147
6.
Zurück zum Zitat Fleiss JL, Cohen J (1973) The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ Psychol Meas 33(3):613–619CrossRef Fleiss JL, Cohen J (1973) The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ Psychol Meas 33(3):613–619CrossRef
8.
Zurück zum Zitat Guo H, Pasunuru R, Bansal M (2018) Dynamic multi-level multi-task learning for sentence simplification. CoRR abs/1806.07304, arXiv:1806.07304 Guo H, Pasunuru R, Bansal M (2018) Dynamic multi-level multi-task learning for sentence simplification. CoRR abs/1806.07304, arXiv:​1806.​07304
9.
Zurück zum Zitat Heafield K (2011) Kenlm: Faster and smaller language model queries. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, Association for Computational Linguistics, pp 187–197 Heafield K (2011) Kenlm: Faster and smaller language model queries. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, Association for Computational Linguistics, pp 187–197
10.
Zurück zum Zitat Islam MA, Islam ABMAA, Anik MSH (2017) Polygot: An approach towards reliable translation by name identification and memory optimization using semantic analysis. In: 2017 4th International Conference on Networking, Systems and Security (NSysS), pp 1–8, https://doi.org/10.1109/NSYSS2.2017.8267795 Islam MA, Islam ABMAA, Anik MSH (2017) Polygot: An approach towards reliable translation by name identification and memory optimization using semantic analysis. In: 2017 4th International Conference on Networking, Systems and Security (NSysS), pp 1–8, https://​doi.​org/​10.​1109/​NSYSS2.​2017.​8267795
12.
Zurück zum Zitat Kim YB, Ehara T (1994) An automatic sentence breaking and subject supplement method for j. E Machine Translation, Transactions of IPSJ 35(6) Kim YB, Ehara T (1994) An automatic sentence breaking and subject supplement method for j. E Machine Translation, Transactions of IPSJ 35(6)
13.
Zurück zum Zitat Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, et al. (2007) Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions, Association for Computational Linguistics, pp 177–180 Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, et al. (2007) Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions, Association for Computational Linguistics, pp 177–180
14.
Zurück zum Zitat Lardilleux A, Lepage Y (2009) Sampling-based multilingual alignment. In: Proceedings of the International Conference RANLP-2009, pp 214–218 Lardilleux A, Lepage Y (2009) Sampling-based multilingual alignment. In: Proceedings of the International Conference RANLP-2009, pp 214–218
15.
Zurück zum Zitat Lita LV, Ittycheriah A, Roukos S, Kambhatla N (2003) Truecasing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, Association for Computational Linguistics, pp 152–159 Lita LV, Ittycheriah A, Roukos S, Kambhatla N (2003) Truecasing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, Association for Computational Linguistics, pp 152–159
16.
Zurück zum Zitat Mahata SK, Mandal S, Das D, Bandyopadhyay S (2018) Smt vs nmt: A comparison over hindi & bengali simple sentences. arXiv preprint arXiv:1812.04898 Mahata SK, Mandal S, Das D, Bandyopadhyay S (2018) Smt vs nmt: A comparison over hindi & bengali simple sentences. arXiv preprint arXiv:​1812.​04898
17.
Zurück zum Zitat Mahata SK, Garain A, Rayala A, Das D, Bandyopadhyay S (2019) Jumt at wmt2019 news translation task: a hybrid approach to machine translation for lithuanian to english. WMT 2019:283 Mahata SK, Garain A, Rayala A, Das D, Bandyopadhyay S (2019) Jumt at wmt2019 news translation task: a hybrid approach to machine translation for lithuanian to english. WMT 2019:283
19.
Zurück zum Zitat Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51CrossRef Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51CrossRef
20.
Zurück zum Zitat Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, Association for Computational Linguistics, pp 311–318 Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, Association for Computational Linguistics, pp 311–318
21.
Zurück zum Zitat Poornima C, Dhanalakshmi V, Anand K, Soman K (2011) Rule based sentence simplification for english to tamil machine translation system. Int J Comput Appl 25(8):38–42 Poornima C, Dhanalakshmi V, Anand K, Soman K (2011) Rule based sentence simplification for english to tamil machine translation system. Int J Comput Appl 25(8):38–42
23.
Zurück zum Zitat Resnik P (1998) Parallel strands: A preliminary investigation into mining the web for bilingual text. In: Conference of the Association for Machine Translation in the Americas, Springer, pp 72–82 Resnik P (1998) Parallel strands: A preliminary investigation into mining the web for bilingual text. In: Conference of the Association for Machine Translation in the Americas, Springer, pp 72–82
24.
Zurück zum Zitat Resnik P, Smith NA (2003) The web as a parallel corpus. Computat Linguist 29(3):349–380CrossRef Resnik P, Smith NA (2003) The web as a parallel corpus. Computat Linguist 29(3):349–380CrossRef
25.
Zurück zum Zitat Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of association for machine translation in the Americas, vol 200 Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of association for machine translation in the Americas, vol 200
26.
Zurück zum Zitat Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
27.
Zurück zum Zitat Vu T, Hu B, Munkhdalai T, Yu H (2018) Sentence simplification with memory-augmented neural networks. CoRR abs/1804.07445, arXiv:1804.07445 Vu T, Hu B, Munkhdalai T, Yu H (2018) Sentence simplification with memory-augmented neural networks. CoRR abs/1804.07445, arXiv:​1804.​07445
28.
Zurück zum Zitat Williams RJ, Zipser D (1989) A learning algorithm for continually running fully recurrent neural networks Williams RJ, Zipser D (1989) A learning algorithm for continually running fully recurrent neural networks
Metadaten
Titel
Simplification of English and Bengali Sentences for Improving Quality of Machine Translation
verfasst von
Sainik Kumar Mahata
Avishek Garain
Dipankar Das
Sivaji Bandyopadhyay
Publikationsdatum
15.02.2022
Verlag
Springer US
Erschienen in
Neural Processing Letters / Ausgabe 4/2022
Print ISSN: 1370-4621
Elektronische ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-022-10755-3

Weitere Artikel der Ausgabe 4/2022

Neural Processing Letters 4/2022 Zur Ausgabe

Neuer Inhalt