Skip to main content

2018 | OriginalPaper | Buchkapitel

Collaborative Matching for Sentence Alignment

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Existing sentence alignment methods are founded fundamentally on sentence length and lexical correspondences. Methods based on the former follow in general the length proportionality assumption that the lengths of sentences in one language tend to be proportional to that of their translations, and are known to bear poor adaptivity to new languages and corpora. In this paper, we attempt to interpret this assumption from a new perspective via the notion of collaborative matching, based on the observation that sentences can work collaboratively during alignment rather than separately as in previous studies. Our approach is tended to be independent on any specific language and corpus, so that it can be adaptively applied to a variety of texts without binding to any prior knowledge about the texts. We use one-to-one sentence alignment to illustrate this approach and implement two specific alignment methods, which are evaluated on six bilingual corpora of different languages and domains. Experimental results confirm the effectiveness of this collaborative matching approach.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Brown, P.F., Lai, J.C., Mercer, R.L.: Aligning sentences in parallel corpora. In: Proceedings of the 29th Annual Meeting on Association for Computational Linguistics (ACL 1991), pp. 169–176 (1991) Brown, P.F., Lai, J.C., Mercer, R.L.: Aligning sentences in parallel corpora. In: Proceedings of the 29th Annual Meeting on Association for Computational Linguistics (ACL 1991), pp. 169–176 (1991)
2.
Zurück zum Zitat Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993) Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)
3.
Zurück zum Zitat Collier, N., Ono, K., Hirakawa, H.: An experiment in hybrid dictionary and statistical sentence alignment. In: Proceedings of the 17th International Conference on Computational Linguistics - The 36th Annual Meeting of the Association for Computational Linguistics (COLING-ACL 1998), pp. 268–274 (1998) Collier, N., Ono, K., Hirakawa, H.: An experiment in hybrid dictionary and statistical sentence alignment. In: Proceedings of the 17th International Conference on Computational Linguistics - The 36th Annual Meeting of the Association for Computational Linguistics (COLING-ACL 1998), pp. 268–274 (1998)
4.
Zurück zum Zitat Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience, Hoboken (1991)CrossRef Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience, Hoboken (1991)CrossRef
5.
Zurück zum Zitat Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. In: Proceedings of the 29th Annual Meeting on Association for Computational Linguistics (ACL 1991), pp. 177–184 (1991) Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. In: Proceedings of the 29th Annual Meeting on Association for Computational Linguistics (ACL 1991), pp. 177–184 (1991)
6.
Zurück zum Zitat Haruno, M., Yamazaki, T.: High-performance bilingual text alignment using statistical and dictionary information. In: Proceedings of the 34th Annual Meeting on Association for Computational Linguistics (ACL 1996), pp. 131–138 (1996) Haruno, M., Yamazaki, T.: High-performance bilingual text alignment using statistical and dictionary information. In: Proceedings of the 34th Annual Meeting on Association for Computational Linguistics (ACL 1996), pp. 131–138 (1996)
7.
Zurück zum Zitat Kit, C., et al.: Clause alignment for hong kong legal texts: a lexical-based approach. Int. J. Corpus Linguist. 9, 29–51 (2004)CrossRef Kit, C., et al.: Clause alignment for hong kong legal texts: a lexical-based approach. Int. J. Corpus Linguist. 9, 29–51 (2004)CrossRef
8.
Zurück zum Zitat Koehn, P.: Europarl: A parallel corpus for statistical machine translation. In: MT Summit 2005, pp. 79–86 (2005) Koehn, P.: Europarl: A parallel corpus for statistical machine translation. In: MT Summit 2005, pp. 79–86 (2005)
9.
Zurück zum Zitat Li, P., Sun, M., Xue, P.: Fast-champollion: a fast and robust sentence alignment algorithm. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010): Posters, pp. 710–718 (2010) Li, P., Sun, M., Xue, P.: Fast-champollion: a fast and robust sentence alignment algorithm. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010): Posters, pp. 710–718 (2010)
10.
Zurück zum Zitat Ma, X.: Champollion: a robust parallel text sentence aligner. In: LREC 2006, pp. 489–492 (2006) Ma, X.: Champollion: a robust parallel text sentence aligner. In: LREC 2006, pp. 489–492 (2006)
12.
Zurück zum Zitat Nie, J.Y., Simard, M., Isabelle, P., Durand, R.: Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1999), pp. 74–81 (1999) Nie, J.Y., Simard, M., Isabelle, P., Durand, R.: Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1999), pp. 74–81 (1999)
14.
Zurück zum Zitat Quan, X., Kit, C., Song, Y.: Non-monotonic sentence alignment via semisupervised learning. In: Proceedings of 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), pp. 622–630 (2013) Quan, X., Kit, C., Song, Y.: Non-monotonic sentence alignment via semisupervised learning. In: Proceedings of 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), pp. 622–630 (2013)
15.
Zurück zum Zitat Varga, D., Németh, L., Halácsy, P., Kornai, A., Trón, V., Nagy, V.: Parallel corpora for medium density languages. In: Recent Advances in Natural Language Processing (RANLP 2005), pp. 590–596 (2005) Varga, D., Németh, L., Halácsy, P., Kornai, A., Trón, V., Nagy, V.: Parallel corpora for medium density languages. In: Recent Advances in Natural Language Processing (RANLP 2005), pp. 590–596 (2005)
16.
Zurück zum Zitat Wu, D.: Aligning a parallel English-Chinese corpus statistically with lexical criteria. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics (ACL 1994), pp. 80–87 (1994) Wu, D.: Aligning a parallel English-Chinese corpus statistically with lexical criteria. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics (ACL 1994), pp. 80–87 (1994)
Metadaten
Titel
Collaborative Matching for Sentence Alignment
verfasst von
Xiaojun Quan
Chunyu Kit
Wuya Chen
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-01716-3_4