Background and Principles

Last updated: 2014 November 21

This document describes the motivation and principles in the QTLaunchPad Multidimensional Quality Metrics (MQM). 


Background

Translation quality assessment (QA) is an important task, but one that is often contentious and fraught with difficulty. Traditional methods were highly subjective and involved reviewers reading translated texts and marking “errors”, but reviewers often disagreed on their assessments. In response, many organizations developed formalized metrics for assigning errors to different types (e.g., terminology, spelling, mistranslations), counting them, and determining how serious they were. In the 1990s, these efforts led to the creation of a widely used specifications such as SAE J2450 and the LISA QA Model. Unfortunately, these models have not been updated over time and they are presented as “one-size-fits-all” models that do not reflect the needs of a rapidly diversifying translation industry.

In addition, as Machine Translation (MT) is increasingly integrated into the translation production chain, it is apparent that mainstream MT quality assessment methods are incompatible with the methods used for human translation and typically do not reflect the needs of actual users of translation.

Principles

The QTLaunchPad project is developing a new system for assessing quality based on the following principles:

  • Flexibility: Sophistication vs. Simplicity. Quality metrics must be adaptable to specific project types. One-size-fits-all models are not suitable for dealing with a range of translation requirements (e.g., information-only “gist” translations on the one extreme and the translation of legal documents on the other). Metrics must be “tunable” along various dimensions (Accuracy, Fluency, Verity, etc.) and responsive to project specifications and must support both simple and sophisticated requirements. Rather than proposing yet another metric with more detail, MQM provides a flexible catalog of defined issue types that can support any level of sophistication, from a simple metric with two categories to a complex one with thirty or forty. It also supports both holistic evaluation (for quick acceptance testing) and detailed analytic evaluation for cases where it is required.
  • Fairness. Translators are often blamed for problems that really are from the source text. When the cause of problems can be identified, QA can be fair: recognizing the work of translation professionals and helping them demonstrate and push back against problematic texts.
  • Suitability for all translation methods. Metrics must be suitable for both human and machine translation, and for all technology and production profiles.
  • Comparability. Results from one assessment task should be comparable with those from another. Even if two assessment tasks do not check the same things, they must be defined in a way that supports comparison between them.
  • Standards basis. Wherever possible, quality metrics should tie into existing standards and best practices to support interoperability and integration with translation workflows. MQM makes extensive use of ASTM F2575 as the basis for specifications that help users decide what issues to check.
  • Granularity. Quality metrics must support varying degrees of granularity, from extremely coarse to extremely fine, in order to support varying requirements (such as “quick and dirty” assessment or detailed forensic analysis to determine the source of problems)

Feedback on these documents is welcome and comments can be sent to feedback [at] qt21 [dot] eu