Skip to main content
Log in

The Meteor metric for automatic evaluation of machine translation

  • Published:
Machine Translation

Abstract

The Meteor Automatic Metric for Machine Translation evaluation, originally developed and released in 2004, was designed with the explicit goal of producing sentence-level scores which correlate well with human judgments of translation quality. Several key design decisions were incorporated into Meteor in support of this goal. In contrast with IBM’s Bleu, which uses only precision-based features, Meteor uses and emphasizes recall in addition to precision, a property that has been confirmed by several metrics as being critical for high correlation with human judgments. Meteor also addresses the problem of reference translation variability by utilizing flexible word matching, allowing for morphological variants and synonyms to be taken into account as legitimate correspondences. Furthermore, the feature ingredients within Meteor are parameterized, allowing for the tuning of the metric’s free parameters in search of values that result in optimal correlation with human judgments. Optimal parameters can be separately tuned for different types of human judgments and for different languages. We discuss the initial design of the Meteor metric, subsequent improvements, and performance in several independent evaluations in recent years.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agarwal A, Lavie A (2008) Meteor, m-bleu and m-ter: evaluation metrics for high-correlation with human rankings of machine translation output. In: Proceedings of the third ACL workshop on statistical machine translation. Columbus, OH, pp 115–118

  • Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT Evaluation with improved correlation with human judgments. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. Ann Arbor, MI, pp 65–72

  • Blatz J, Fitzgerald E, Foster G, Gandrabur S, Goutte C, Kulesza A, Sanchis A, Ueffing N (2003) Confidence estimation for machine translation. Technical report natural language engineering workshop final report, Johns Hopkins University

  • Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2007) (Meta-) evaluation of machine translation. In: Proceedings of the second workshop on statistical machine translation. Prague, Czech Republic, pp 136–158

  • Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2008) Further meta-evaluation of machine translation. In: Proceedings of the third workshop on statistical machine translation. Columbus, OH, pp 70–106

  • Callison-Burch C, Koehn P, Monz C, Schroeder J (2009) Findings of the 2009 workshop on statistical machine translation. In: Proceedings of the fourth workshop on statistical machine translation. Athens, Greece, pp 1–28

  • Heafield K, Hanneman G, Lavie A (2009) Machine translation system combination with flexible word ordering. In: Proceedings of the fourth workshop on statistical machine translation. Athens, Greece, pp 56–60

  • Lavie A, Agarwal A (2007) METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the second ACL workshop on statistical machine translation. Prague, Czech Republic, pp 228–231

  • Lavie A, Sagae K, Jayaraman S (2004) The significance of recall in automatic metrics for MT evaluation. In: Proceedings of the 6th conference of the association for machine translation in the Americas (AMTA-2004). Washington, DC, pp 134–143

  • Leusch G, Ueffing N, Ney H (2006) CDER: efficient MT evaluation using block movements. In: Proceedings of the thirteenth conference of the European chapter of the association for computational linguistics. pp 241–248

  • Melamed ID, Green R, Turian J (2003) Precision and recall of machine translation. In: Proceedings of the HLT-NAACL 2003 conference: short papers. Edmonton, Alberta, pp 61–63

  • Miller G, Fellbaum C (2007) WordNet. http://wordnet.princeton.edu/

  • Och FJ (2003) Minimum error rate training for statistical machine translation. In: Proceedings of the 41st annual meeting of the association for computational linguistics. pp 160–167

  • Papineni K, Roukos S, Ward T, Zhu W-J (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of 40th annual meeting of the association for computational linguistics (ACL). Philadelphia, PA, pp 311–318

  • Porter M (2001) Snowball: a language for stemming algorithms. http://snowball.tartarus.org/texts/introduction.html

  • Przybocki M, Peterson K, Bronsart S (2008) Official results of the NIST 2008 “Metrics for MAchine TRanslation” Challenge (MetricsMATR08). http://nist.gov/speech/tests/metricsmatr/2008/results/

  • Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th conference of the association for machine translation in the Americas (AMTA-2006). Cambridge, MA, pp 223–231

  • van Rijsbergen C (1979) Information retrieval, Chap. 7, 2nd edn. Butterworths, London, UK

    Google Scholar 

  • Ye Y, Zhou M, Lin C-Y (2007) Sentence level machine translation evaluation as a ranking. In: Proceedings of the second workshop on statistical machine translation. Prague, Czech Republic, pp 240–247

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alon Lavie.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lavie, A., Denkowski, M.J. The Meteor metric for automatic evaluation of machine translation. Machine Translation 23, 105–115 (2009). https://doi.org/10.1007/s10590-009-9059-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-009-9059-4

Keywords

Navigation