Multidimensional Quality Metrics (MQM) Definition


Copyright ©2014, Deutsches Forschungszentrum für Künstliche Intelligenz GmbH / German Research Center for Artificial Intelligence (DFKI)
Creative Commons License
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.



Document status

This document is a draft of the MQM specification. It is subject to frequent and substantial revision and should not be relied upon for implementation.


Feedback on this document should be submitted to


This document defines the Multidimensional Quality Metrics (MQM) framework. It contains a description of the issue types, scoring mechanism, and markup, as well as informative mappings to various quality systems. MQM provides a flexible framework for defining custom metrics for the assessment of translation quality. These metrics may be considered to be within the same “family” as they draw on a common inventory of values for data categories and a common structure. MQM supports multiple levels of granularity and provides a way to describe translation-oriented quality assessment systems, exchange information between them, and embed that information in XML or HTML5 documents.

Before you begin

It is strongly recommended that readers who are not already familiar with MQM review the presentation online at prior to reviewing this specification. This presentation provides an overview of MQM functionality and philosophy that will assist readers of this technical document.

Table of Contents

1. Introduction (non-normative)

Multidimensional Quality Metrics (MQM) provides a framework for describing and defining quality metrics used to assess the quality of translated texts and to identify specific issues in those texts. It provides a systematic framework to describe quality metrics based on the identification of textual features. This framework consists of the following items:

MQM does not define a single metric intended for use with all translations. Instead it adopts the “functionalist” approach that quality can be defined by how well a text meets its communicative purpose. In practical terms, this statement means that MQM is a framework for defining a family of related metrics.

1.1. Quality assessment, quality assurance, and quality control

Within the translation industry, three terms are used somewhat interchangeably to refer to quality activities: quality assessment, quality assurance, and quality control. However, within broader literature on quality these terms have distinct meanings and should be distinguished:

The focus of MQM is on quality assessment, which is essential to quality assurance and quality control. This document does not, however, specify or recommend particular quality assurance or quality control processes. (Note that within the translation industry there is widespread confusion between “quality assessment” and “quality assurance” within the localization industry, partially due to the adoption of the LISA Quality Assurance Model, which actually provided a model for quality assessment.)

2. Terms and definitions (normative)

The following terms and definitions apply in this document.

Analytic metric
A metric that functions by identifying precise locations of issues within a text and categorizing them.
Data category
An abstract concept for a particular type of information for describing translation quality metrics, such as an issue types, weights, and other aspects of a metric.
A specific aspect of the text that can be evaluated for adherence to specifications. For example, if the project parameters specify that the translated text must conform to Oxford UP style, then one of the dimensions evaluated would be the Style issue type. In MQM, dimensions are generally defined in reference to one or more issue types that correspond to the requirements of one of the twelve parameters used in MQM.
An error is a specific instance of an issue that has been verified to be incorrect.
Error penalty
A numeric indication of the quantity of errors considered in determining an MQM score.
As issue is a potential problem detected in content. (Note: The term issue as used in this document refers to any potential error detected in a text, even if it is determined not to be an error. For example, if an automated process finds that a term in the source does not appear to have been translated properly, it has identified an issue. If human examination finds that the term was translated improperly, it is an error. However, examination might also find that the issue was not an error because the linguistic structure in the translation dictated that the term be replaced by a pronoun, so the translation is correct. Since issues may be automatically detected or incorrectly identified, this document refers to issues in most contexts.)
A set of issue types used for assessing a translation. In general, metrics will include a scoring metric and weights for the issue types.
An aspect of a translation that defines expectations concerning the translation product. For example, “target language/locale” is the parameter that states what language/locale the translated text should appear in.
Quality is the adherence of the text to appropriate specifications. In the case of translated texts, the following formulation applies:
A quality translation demonstrates required accuracy and fluency for the audience and purpose and complies with all other negotiated specifications, taking into account end-user needs.
For monolingual source texts, the formulation may be modified as follows:
A quality text demonstrates required fluency for the audience and purpose and complies with all other negotiated specifications, taking into account end-user needs.
A metric based on one or more questions or statements, corresponding to issue types, that serve as the basis for evaluating the translation as a whole, along with definitions and examples to clarify the meaning of each statement or question, a scale of values on which to rate each item, and standards of excellence for specified performance levels.
An indication of the how severe a particular instance of an issue is. Issues with higher severity have more impact on perceived quality of the text. The default MQM severity model has three levels: minor, major, and critical.
A description of the requirements for the translation, as defined by ISO/TS-11669. MQM utilizes a subset of the full specifications defined in ISO/TS-11669.
A numerical indication of the how important a particular issue type is in overall quality assessment. The default weight for issues is 1.0. Higher numbers assign more importance to an issue type, while lower numbers assign a lower importance. A weight of 0 would indicate that an issue is checked but not counted in MQM scores. Weights serve as multipliers for error penalties in MQM scoring.

3. Principles (non-normative)

3.1. Fairness

MQM is designed to apply to (monolingual) source texts as well as translated target texts. Items described in MQM’s “Fluency”, “Design”, “Internationalization”, and “Verity” branches can apply equally well to source texts and target texts. Only the “Accuracy” branch is specific to translated texts. The default MQM scoring method allows for users to assess source texts to obtain a quality score for source texts and, if both source and target are assessed, issues found in the source may be counted against penalties for issues in the target text, resulting in higher scores. While not all implementations or usages scenarios will examine the source or count problems in the source in favor of translators, this principle is intended to help ensure that translators are recognized and credited when they have to translate inferior source texts rather than being blamed for all problems, even those beyond their control.

3.2. Flexibility

There are a number of ways to assess the quality of translations. Two primary methods are used in industry and academia:

MQM is ideally suited for implementation as an analytic metric. It is also easily adapted to serve as the basis for holistic assessments.

Rather than proposing a single metric for assessing all translations, MQM provides a flexible method for defining and declaring metrics that can be adapted to specific requirements. These requirements are generally stated in terms of a set of 12 “parameters” (see Section 8.1. MQM parameters), a subset of the translation parameters described in ISO/TS 11669:2012: Translation projects -- General guidance that focuses primarily on aspects of the translation product (rather than the project or process). Using these parameters to define requirements and expectations before translation allows users to create appropriate metrics before translation begins and provides translators with a clear view of the criteria for assessing their work.

In addition, metrics must support both simple and sophisticated requirements. Rather than proposing yet another metric with more detail, MQM provides a flexible catalog of defined issue types that can support any level of sophistication, from a simple metric with two categories to a complex one with thirty or forty. It also supports both holistic assessment (for quick acceptance testing) and error markup/counts for cases where detailed analysis is required.

4. Conformance (normative)

Conformance of a translation quality assessment metric with MQM is determined by the following criteria:

Note that the only required aspect is use of the MQM vocabulary, which MUST NOT be contradicted or overridden.

5. Issue types (normative)

5.1. MQM issues

5.1.3. List of MQM issues

The full list of MQM issues is maintained in a separate document at issues-list-2014-03-19.html.

5.1.2. High-level structure

At the top level, MQM is defined into five major “branches”: Accuracy, Fluency, Verity, Design, and Internationalization. It also contains Other, used for issues that cannot be assigned elsewhere, and Compatibility, a branch that contains deprecated issues that are retained for compatibility with legacy systems, notably the LISA QA Model. These five main branches represet the top level in the MQM hierarchy and themselves may serve as issue types. These branches are defined as follows:

5.2. MQM Core

In order to simplify the application of MQM, MQM defines a smaller “Core” consisting of 21 issue types that represent the most common issues arising in linguistic quality assessment of translated texts. The Core does not address formatting and many applications may wish to add items from the Design branch to the Core. The Core represents a relatively high level of granularity suitable for many tasks. Where possible, users of MQM are encouraged to use issues from the Core to promote greater interoperability between systems.

The MQM Core can be graphically represented as follows (available here in SVG format):

MQM Core

The 21 issues are defined in the MQM core as follows:

Definitions for these issues can be found in the list of MQM issue types.

Even the 21 issues of the Core represent more issues than are likely to be checked in any given quality application and users may define subsets of the core for their needs. It is recommended for translation quality assessment tasks that issues contain at least the issue types Accuracy and Fluency if no other more granular types are included.

5.3. User extension

While users are strongly encouraged to limit issue types to pre-defined MQM issues, they may add additional issue types to MQM to meet additional requirements. User-defined issue types MUST include the following information:

User extensions do not provide interoperability between systems and impede the exchange of data. Nevertheless they may be needed to support requirements not anticipated in MQM. Users should tie extensions into the predefined hierarchy using the parent value as much as possible since doing so provides consumers of MQM data with the best guidance in interpreting unknown categories and mapping them to other systems. As with other aspects of MQM, users should limit granularity to the least granular level that meets requirements.

Users who encounter frequent need for custom extensions are encouraged to communicate their requirements to the MQM project for possible inclusion of these types in future versions of MQM.

6. Scoring (normative)

The MQM scoring model applies only to error-count implementations of MQM. This specification does not define a scoring model for holistic systems, which are less detailed in nature than error-count metrics.

MQM can generate document quality scores according to the following formula:



TQ = quality score
The overall rating of quality
AP = penalties for Accuracy
Sum of all weighted penalty points assigned in the Accuracy branch
FPT = Fluency penalties for the target
Sum of all weighted penalty points in the target text assigned to the Fluency branch. (Note: for computational purposes, Design and Internationalization are treated with Fluency.)
FPS = Fluency penalties for the source
Sum of all weighted penalty points in the source text assigned to the Fluency branch. If the source is not assessed FPS = 0.
VPT = Verity penalties for the target
Sum of all weighted penalty points in the target text assigned to the Verity branch
VPS = Verity penalties for the source
Sum of all weighted penalty points in the source text assigned to the Verity branch. If the source is not assessed VPS = 0.

All penalties are relative to the sample size (in words) and are calculated as follows (assuming default weights and severity levels):

P=(Issuesminor+Issuesmajor×SeverityMultipliermajor+Issuescritical×SeverityMultipliercritical)Word count


Issuesminor = Number of issues with a “minor” severity
Issuesmajor = Number of issues with a “major” severity
Issuescritical = Number of issues with a “critical” severity

A score can thus be generated through the following (pseudo-code) algorithm:

foreach accuracyIssue {
	accuracyIssueTotal = accuracyIssueTotal +
	(accuracyIssue * weight[accuracyIssueType] * severityMultiplier);

foreach targetFluencyIssue {
	targetFluencyIssueTotal = targetFluencyIssueTotal +
	(targetFluencyIssue * weight[targetFluencyIssueType] * severityMultiplier);

foreach sourceFluencyIssue {
	sourceFluencyIssueTotal = sourceFluencyIssueTotal +
	(sourceFluencyIssue * weight[sourceFluencyissueType] * severityMultiplier);

foreach targetVerityIssue {
	targetVerityIssueTotal = targetVerityIssueTotal +
	(targetVerityIssue * weight[targetVerityIssueType] * severityMultiplier);

foreach sourceVerityIssue {
	sourceVerityIssueTotal = sourceVerityIssueTotal +
	(sourceVerityIssue * weight[sourceVerityIssueType] * severityMultiplier);

// Generate subscores
accuracySubScore = accuracyIssueTotal / wordCount;
fluencySubScore = (targetFluencyIssueTotal - sourceFluencyIssueTotal) / wordCount;
veritySubScore = (targetVerityIssueTotal - sourceVerityIssueTotal) / wordCount;

// Generate overall score
translationQualityScore = 100 - accuracySubScore - fluencySubScore - veritySubScore;

In this algorithm, each issue type has a weight assigned by the metric that is retrieved and used to determine the individual penalties. Penalties are cumulative. Note that if the source is examined, penalties against the source are effectively added to the overall score for the translation, reflecting the fact that they indicate problems in the source the translator had to deal with. If the source is not assessed, the source penalties are by definition 0 and do not count for or against the translation’s quality score.

Default values for error-count metrics

For the purposes of calculating quality scores, the following default values apply:

All issues have a default weight of 1.0. This weight can be updated on a per-issue basis to reflect specific requirements.
The default severity levels are defined as follows:
  • minor: 1.0. Minor issues are issues that do not impact usability or understandability of the content. For example, if an extra space appears after a full stop, this may be considered an error, but does not render the text difficult to use or problematic (even if it should be corrected).
  • major: 5.0. Major issues are issues that impact usability or understandability of the content but which do not render it unusable. For example, a misspelled word may require extra effort for the reader to understand the intended meaning, but do not make it impossible.
  • critical: 10.0. Critical issues are issues that render the content portion unfit for use. For example, a particularly bad grammatical error that changes the meaning of the text would be considered critical.

The default severity weights are taken from the LISA QA Model and represent common industry practice. This practice has not been experimentally validated and other values may be more appropriate for specific purposes (e.g., for an important legal text it might be appropriate to assign a higher value to critical such that even a single critical error will cause the text to fail). Users MUST declare any values that differ from the default values. [TO BE ADDED.]

7. Markup (normative)

This section describes the MQM declarative markup. Use of the metrics declaration markup is mandatory for declaring an interoperable MQM metric. When used with XML or HTML, it is strongly recommended that the ITS 2.0 Localization Quality Issue data category be used to declare MQM issues in conjunction with the locQualityProfileRef pointing to a valid MQM definition. Note that when implemented with ITS 2.0 quality markup that the requirements for implementing are also mandatory.

7.1. MQM metrics description

MQM provides an XML mechanism for exchanging descriptions of MQM-compliant metrics. MQM metrics description files use the .mqm file name extension. An .mqm file contains a hierarchical list of MQM issue types. This listing MUST conform to the hierarchy of issue types.

The following is an example of a small metric description file with issue names in both English and German. It includes a user-defined extension (x-respeaking) used to identify errors caused when a vocal text being respoken without background noise based on a live audio feed is incorrectly repeated by the person doing the respeaking, leading to a mistranscription.

<?xml version="1.0" encoding="UTF-8"?>
<mqm version="2.0">
    <name>Small metric</name>
    <descrip>A small metric intended for human consumption</descrip>
    <issue type="accuracy" display="no">
      <issue type="mistranslation" display="no">
        <issue type="terminology" weight="1.5"/>
      <issue type="omission" weight="0.7"/>
      <issue type="addition"/>
    <issue type="fluency" display="no">
      <issue type="content" display="no">
        <issue type="style" weight="0.5"/>
      <issue type="mechanical" display="no">
        <issue type="spelling"/>
        <issue type="grammar"/>
      <issue type="unintelligible" weight="1.5"/>
    <issue type="x-respeaking" weight="1.5"/>
    <displaNameSet lang="en">
      <displayName typeRef="accuracy">Adequacy</displayName>
      <displayName typeRef="terminology">Terminology</displayName>
      <displayName typeRef="omission">Omission</displayName>
      <displayName typeRef="addition">Addition</displayName>
      <displayName typeRef="fluency">Fluency</displayName>
      <displayName typeRef="content">Content</displayName>
      <displayName typeRef="style">Style</displayName>
      <displayName typeRef="mechanical">Mechanical</displayName>
      <displayName typeRef="spelling">Spelling</displayName>
      <displayName typeRef="grammar">Grammar</displayName>
      <displayName typeRef="unintelligible">Unintelligible</displayName>
      <displayName typeRef="x-respeaking">Respeaking</displayName>
    <displayNameSet lang="de">
      <displayName typeRef="accuracy">Genauigkeit</displayName>
      <displayName typeRef="terminology">Terminologie</displayName>
      <displayName typeRef="omission">Auslassung</displayName>
      <displayName typeRef="addition">Ergänzung</displayName>
      <displayName typeRef="fluency">Sprachkompetenz</displayName>
      <displayName typeRef="content">Inhalt</displayName>
      <displayName typeRef="style">Stil</displayName>
      <displayName typeRef="mechanical">Mechanisch</displayName>
      <displayName typeRef="spelling">Rechtschreibung</displayName>
      <displayName typeRef="grammar">Grammatik</displayName>
      <displayName typeRef="unintelligible">Unverständlich</displayName>
      <displayName typeRef="x-respeaking">Sprecherfehler</displayName>
    <severity id="minor" multiplier="1"/>
    <severity id="major" multiplier="5"/>
    <severity id="critical" multiplier="10"/>

7.2. MQM inline attributes

MQM implements the following attributes in the mqm namespace:

MQM is designed to be used in conjunction with the following ITS 2.0 attributes from the localization quality issue data category:

To ensure compatibility with ITS 2.0 markup, implementers SHOULD use ITS 2.0 markup where possible. All of the ITS 2.0 localization quality annotation may be used. MQM markup adds capability to the ITS 2.0 quality markup.

<?xml version="1.0"?>
<doc xmlns:its="" its:version="2.0">
<doc xmlns:mqm="[XXXXXXXXXXX]" mqm:version="1.0">
      its:locQualityIssueComment="Should be Roquefort"
      its:locQualityIssueSeverity="50">Roqfort</span> is an cheese</para>

To create this markup the following process is followed:

  1. The MQM issue type (spelling) is mapped to the corresponding ITS 2.0 type (ITS 2.0 is less fine-grained than MQM in many cases) as described in 8. Relationship to ITS and added as the value of its:locQualityIssueType.
  2. The MQM issue type and severity are declared in the mqm: namespace
  3. The value of the severity multiplier is declared on a scale from 0 to 100 and inserted as the value of the its:locQualityIssueSeverity attribute. In this case the multiplier value was 5 (out of 10), so it is represented as 50 in ITS markup.
  4. A comment is added using the its:locQualityIssueComment attribute.
  5. Globally, the relevant profile (specifications and metric definition) are linked using the its:locQualityProfile attribute. [Not shown: need to be added.]

7.3. MQM inline elements

In general, MQM XML implementations should use existing span-level elements in the native XML format that MQM is being added to where possible. This use can be done using any of the ITS 2.0 methods with the addition of the MQM-specific attributes. However, such elements may not be available. In such cases, MQM defines two elements that can be used to add inline markup:

Two empty elements are used so as to prevent any interference between MQM tags and existing XML structure, such as those that could be caused by improperly nested elements. To pair these tags the id attribute is used. ID values MUST be unique within the document to prevent confusion.

An example of an MQM annotation is seen in the following XML snippet:

    <para>“Instead of strengthening
        <mqm:startIssue type="function-words" id="1f59a2" severity="minor" agent="f-deluz" comment="article unneeded here" active="yes"/>
        the<mqm:endIssue idref="1f59a2"/>
        civil society, the president cancels
        <mqm:startIssue type="agreement" severity="major" comment="should be “it”" agent="f-deluz" id="3c469d" active="yes"/>them<mqm:endIssue idref="3c469d"/>
        de facto”, deplores Saeda.

The mqm:startIssue element MUST take the following mandatory attributes:

The mqm:startIssue element CAN take the following optional attributes:

In addition, ITS 2.0 attributes MAY be added to these elements to promote greater interoperability.

The mqm:endIssue element MUST take the following mandatory attribute:

Use of these inline elements also requires that the mqm namespace be declared in the document. The method for declaring this namespace needs to be determined.

8. Relationship to ITS 2.0 (normative)

The Internationalization Tag Set (ITS) 2.0 specification holds a privileged position with respect to MQM due to its use as a standard format for interchanging localization quality information through its localization quality issue data category.

This section describes the mapping process from MQM to ITS 2.0 and from ITS 2.0 to MQM. As MQM allows the declaration of arbitrary translation quality assessment metrics, it serves a different purpose from ITS, which provides high-level interoperability between different metrics. While ITS is much less granular than the full MQM hierarchy, individual MQM metrics may be either more or less granular than the set of ITS 2.0 localization quality issue types (or may be more granular in some areas and less in other). As a result it is likely that conversion between MQM-based metrics and ITS will be “lossy” to some extent. In general the mapping process from MQM to ITS 2.0 is straight-forward since ITS 2.0 does not allow subsetting of the possible values for localization quality issue type, but the conversion from ITS 2.0 to MQM may be more challenging since an arbitrary MQM metric may or may not contain the default target mappings provided below and mappings may account for the MQM hierarchy.

MQM metrics that map to ITS MUST use the mappings described in this section, subject to the limitations described below.

8.1. MQM-to-ITS mapping

MQM issue types are mapped to ITS issue types according to the following table. Note that this mapping is unambiguous and MUST be followed to ensure consistency between applications.

MQM issue typeITS 2.0 issue type
1. Accuracy
Company terminology
Normative terminology
Overly literalmistranslation
False friend
Should not have been translated
Unit conversion
Entity (such as name or place)inconsistent-entities
Improper exact matchmistranslation
Omitted variable
Untranslated graphic
2. Fluency
2.1. Content
Company style
Style guide
Image vs. text
Terminological inconsistency
Inconsistent link/cross-reference
Monolingual terminologyterminology
Monolingual terminology, normative
Unclear reference
2.2. Mechanical
Unpaired quote marks or brackets
Word form
Part of speech
Word order
Function words
Locale violationlocale-violation
Date format
Time format
Measurement format
Number format
Quote mark type
National language standard
Character encodingcharacters
Pattern problempattern-problem
Corpus conformancenon-conformance
Page references
Index/TOC format
Missing/incorrect item
3. Verity
End-user suitabilityother
Legal requirementslegal
Locale applicabilitylocale-specific-content
4. Design (mono- and/or bi-lingual)
Overall design (layout)formatting
Global font choice
Footnote/endnote format
Headers and footers
Page break
Local formatting
Text alignment
Paragraph indentation
Font (local)
Wrong size
Single/double-width (CJK only)
Graphics and tablesformatting
Call-outs and caption
Truncation/text expansionlength
5. Internationalizationinternationalization
6. Legacy mapping list
Bill of materials/Runlistother
Application compatibility
Book-building sequence
File format
Embedded text
Output device
Release guide
Does not adhere to specifications
Terminology, contextually inappropriate
Style, publishing standards

Note that the entire Internationalization branch of MQM maps to the ITS internationalization type. It is anticipated that this mapping will apply to all children of the MQM Internationalization issue type that may be added in the future.

8.2. ITS-to-MQM mapping

Mapping from ITS to MQM is less likely to be used and presents particular problems since MQM metrics typically contain only a small subset of the full MQM issue set. As a result MQM issues to which ITS localization quality issue type values are mapped may not exist in a particular MQM metric. In such cases processes MUST map the ITS value to the closest higher-level issue type in MQM if one exists in the target MQM metric. If no higher-level issue type exists in the target MQM metric, the process MUST skip the ITS 2.0 issue type (but MAY preserve the ITS 2.0 markup).

For example, if a process encounters the ITS 2.0 omission type and the target MQM metric does not contain Omission but does contain Mistranslation, the ITS omission value would be mapped to MQM Mistranslation. However, if the MQM metric does not contain Mistranslation or Accuracy, the two higher nodes in the MQM hierarchy, the ITS omission issue type would be ignored/omitted by the conversion process.

Note that the above requirements mean that in some cases there may be a many-to-one mapping from ITS to MQM. For example, if a document contains ITS annotations for terminology, omission, untranslated, and addition, but the target MQM metric contains Mistranslation and no daughter categories, all of these categories would be mapped to MQM Mistranslation. In other words, there is no universal mapping from ITS to MQM since MQM metrics do not all contain the same issues.

Processes encountering issues such as those described in the previous paragraphs SHOULD alert the user about the information loss or remapping if user interaction is expected by the process.

In most cases the table shows that the ITS issue types map to MQM issue types with identical (except for casing) or similar names, highlighting the evolutionary relationship between ITS and MQM. Those items where names are different in a non-trivial manner are marked with an asterisk (*) to help draw attention to the fact that the names do not match.

ITS 2.0 Localization Quality Issue typeMQM issue type
legalLegal requirements*
locale-specific-contentLocale applicability*
locale-violationLocale violation
charactersCharacter encoding*
formattingLocal formatting*
inconsistent-entitiesEntity (such as name or place)*
pattern-problemPattern problem
non-conformanceCorpus conformance*

Note that the ITS uncategorized category maps to MQM Other even though MQM Unintelligible maps to ITS uncategorized. In other words, the mapping is asymmetric because the semantics of uncategorized are broader than Unintelligible.

9. Creating MQM metrics (non-normative)

This section describes the process for creating an MQM metric in cases where a suitable predefined metric is not available. The process may be graphically represented as shown below:

MQM process overview

In this view, implementers first determine what sort of metric they wish to use (analytic, holistic, task-based testing, functional testing, etc.) based on the following criteria:

Based on the answers to the questions given above, users may select a method (the “how”) for assessing the translation. Some of the possible options include the following:

In addition to selecting an assessment method based on the answers to the questions on the left of the diagram, users also need to define the specifications (i.e., the values of the parameters) for the translation(s) to be assessed. (The MQM parameters are defined in section 8.2. Definition of MQM parameters below.) Based on the specifications, users decide which dimensions of the text will be assessed. Dimensions defined in MQM are the following:

Note that the dimensions correspond to high-level issue types or groups of issue types (in the case of Orthography) in the MQM hierarchy.

Depending upon which dimensions are selected and the degree of granularity required for the assessment task, MQM issues are then selected to ensure that the required dimensions are adequately assessed.

8.1. Example of defining a metric

The following example will help clarify how the process works. The example is for a case in which a company that makes network diagnostic gear wishes to evaluate whether automatic (machine) translations into Japanese of user-generated forum content written in English is helping their Japanese users solve technical problems with their equipment.

  1. Selection of assessment method.
    1. What: The company wishes to assess a translation product (the forum content) and also the MT system they are using to translate the content.
    2. Who: The company wants to use its customers to evaluate the translation since they are the only ones who can determine whether the content meets their needs.
    3. Where: The assessment must be done on the user-to-user forum with end users who are not experts in translation or language and who cannot be trained in advance.
    4. When: The assessment will take place after texts have already been published on the website. The texts will be raw MT output with no post-editing or other correction.
    5. Why: The assessment will be used to determine if the MT system’s results help users meet their needs or whether more manual processes (e.g., MT + post-editing) are required.

    Based on these answers, the company decides to use a holistic assessment method with a low number of dimensions (no more than three).
  2. Creating the specifications. The company fills out a worksheet to define the values for the parameters in their specification (described in section 8.2. Definition of MQM parameters below) and creates a full set of translation specifications.
  3. Selection of Dimensions. Based on their translation specifications they determine that the following dimensions are relevant to this task: Accuracy, Terminology, Fluency (and all its children) and Verity. Because of the nature of the assessment method and the assessors, however, the company decides to limit assessment to three dimensions: Terminology, Fluency, and Verity. Although Accuracy is highly important, they cannot expect their users to understand English well enough to assess the accuracy of translated texts.
  4. Building the metric. Based on the selection of a holistic metric and three dimensions, the company selects three issue types (which correspond directly to their dimensions) and implements a metric with three questions on their website at the end of each translated forum entry:
    1. Did this answer enable you to solve your problem? (Yes/No) (Addresses Verity)
    2. Was this answer grammatically correct? (Yes/No) (Addresses Grammar)
    3. Did this answer use the correct words to describe your product and the solution? (Yes/No) (Addresses Terminology)

    In addition, because the company realizes that their customers cannot assess some core aspects and help them evaluate their MT system, they decide to create a second, analytic metric for human assessors to check a subset of the output.

Although simple, this example, shows how it is possible to build customized metrics to meet specific requirements using MQM.

8.2. Definition of MQM parameters

MQM makes use of a selection of 11 of the 21 parameters defined in ISO/TS-11669, with the addition of one additional parameter, Output modality, not directly addressed in the current version of ISO/TS-11669 (but included in the pending revision of ASTM F2575, where the parameters are also found). The parameters are defined as follows:

This table must be updated to reflect recent changes in the ASTM version.

Parameter Description
1. Language/locale
The language into which the text is to be translated
This parameter should specify geographical language variants where appropriate.
  • the text is to be translated into Swiss German (de-CH)
  • the text is to be translated into Cantonese as spoken in Hong Kong using Traditional Chinese characters (zh-HK-Hant)
2. Subject field/domain
Subject field(s) (domain(s)) of the source text
This information should be as specific as possible to assist translation providers in finding the best translators for the job
  • the text is a specialized text dealing with meteorological science
  • the text is a sixteenth-century legal text regarding fishing rights in the North Sea
3. Teminology
List of terms or reference to terms to be used
These terms are domain- or project-specific ones
  • the requester provides instructions to see a website that defines many of the domain-specific terms in the project
  • the requester states that specialist physics terms are to be used
4. Text type
The type of the source content
Needed to locate resources with the appropriate linguistic skills. For example, a translator who specializes in technical translations may not be ideal to translate a compilation of 12th-century religious poems.
Note that “Text type” is known as “Form of the text” in ASTM F2575
  • user manual
  • literary novel set in medieval Ireland
5. Audience
The project’s target audience
The audience should be described or defined as precisely as possible without being too restrictive
  • business analysts with a background in Russian mineral exploration activity
  • teenage users of tablet computers
6. Purpose
statement of the purpose or intended use of the translation
This information is useful in helping the translator decide the appropriate manner in which to translate the text. In some cases the purpose of the translation may differ significantly from the purpose of the source text.
  • the text is intended for entertainment, to transmit information, or to persuade an audience of a political point
  • the source text was written to convince youth to join a political movement but the translation is to used by foreign journalists to help them understand the goals of this political movement
7. Register
Description of the linguistic register to be used in the target language
Register is often difficult to infer from the source text and must be defined on a per-language basis
  • the text is an informal conversation between friends and should be translated in German using the du form
  • the text is a formal letter to the Hungarian ambassador and should be translated using the Őn pronouns and very formal honorifics, salutations, and grammatical structures
8. Style
Information about the document’s style.
Could include formal style guides, references to comparable documents, or other clear indications of style expectations
  • the text is a promotional piece for investors and style is highly relevant, with the translation trying to capture an air of excitement
  • the text is intended for use by technicians in a service environment and style is considered irrelevant
  • the text is to be published by a press with very specific in-house style rules that must be followed
9. Content correspondence
Specifies how the content is to be translated
The default assumption is that text is to be fully translated and adapted to the target locale (a covert, localized translation). In some instances, requesters may ask for partial or summary translations
  • a British English text should be fully translated into German but all prices should be left in pounds sterling rather than converted to euros
  • a marketing text should be heavily adapted to match target language conventions, with the translator free to rewrite portions as needed to appeal to the audience
  • the text should be translated as a summary that presents the main points but leaves out details
10. Output modality
Information about the way in which the translated text will be displayed/presented
This parameter provides information about the specific environments in which the text will be output and any limitations or special requirements they may impose.
  • the text is to be output as captions on a YouTube video
  • the text will be used in voice prompts for a telephone dialogue system with a female voice reading the prompts
  • the text will be displayed on an embedded LCD screen of a device and is limited to a length of 25 characters
11. File format
The file format(s) in which the translated content is to be delivered
It is quite common for the target file format to differ from the source file format
  • the translator is asked to translate a text in an InDesign file but to return the translation as an RTF text
  • the translator is to return text in Microsoft Word (.docx) format and graphics in layered TIFF format
12. Production technology
Any technology or software to be used in the translation process
May be generic or specific as to particular translation tools.
Production technology is included, even though it is not a product parameter, because specific technologies may have an impact on likely issues in the target texts they produce.
  • the project is to be completed using a translation memory tool of the translator’s choice
  • the translation must use TTC TermBase v3

After the values for these parameters are fully specified, MQM implementers should verify that the selection of issue types will ensure that the requirements defined by the parameters are met. Note that parameters may override each other. For example, under Content correspondence the parameters might specify that a “gist” translation is acceptable, in which case Style would not normally be assessed; however if Audience specifies that the target audience consists of young readers with low literacy, Style might be assessed to assure that the “simple” style needed for the target audience is achieved.

At this stage in MQM development, there are no normative guidelines for selecting issues. Instead implementers are encouraged to go through each parameter to identify project-relevant issues that will enable them to verify whether the translation meets the requirements set out in those parameters. Future versions of MQM may provide a more formal approach to issue selection.

8.3. Analytic metrics

Analytic metrics are created by making a selection of relevant issues from the listing of MQM issue types. The following procedure may be used to create a metric:

  1. Complete a full set of project specifications, including the 12 MQM parameters. Ensure that all stakeholders are in agreement about the values of the parameters. (Note that the value of some parameters, such as the target language, may change from project to project, so implementers should consider the range of likely values. For example, if a project will be translated into 15 languages, the impact each language might have should be considered.)
  2. For the value of each parameter, consider what features of the text would be needed to verify that the text meets specifications and note these issue types down. Note that “doesn’t matter” is an acceptable value for many parameters and if this value is chosen, the parameter may be skipped. (E.g., if Style is judged to be insignificant, then this parameter will be skipped in assessment.
  3. After deciding what features need to be checked, determine which issue types can be used to assess that feature and note these types.
  4. From the list of issue types, prioritize them based on the importance of each parameter and then make a selection of issue types based on this list and the priorities. (Note that it may be impractical to do fine-grained analysis of every potential issue type identified. Feedback from LSPs suggests that six to seven issue types is sufficient for most assessment tasks, although some use up to twenty.
  5. If a score is to be assigned, assign weights to the issues. Assigning weights is a tricky process and should be done by assessing existing translations deemed to be acceptable, borderline acceptable, and unacceptable to see what impact each issue type has on that judgment. Note that some existing metrics, such as SAE J2450, have predefined weights that should be honored. The default issue weight in MQM is 1.0 and any positive decimal value may be used.
  6. If the resulting metric is to be implemented in an MQM-compliant tool chain, it should be declared as described in Section 7.1. MQM metrics description.

When considering which issues to check, creators of metrics should consider the following practical guidelines:

  1. Are there any requirements for compatibility with legacy systems or standard/semi-standard specifications? If so, choose issue types that correspond to those used by those systems/specifications. In most cases it is possible to emulate legacy metrics in MQM with little or no modification, although some might require the use of custom extensions.
  2. Select the least granular issue types that allow assessment of whether the text meets specifications. For example, in many cases use of the category Grammar would be sufficient because it is not particularly relevant to know what subcategory is used. On the other hand, when trying to diagnose problems generated by an MT system, finer-grained types might be necessary.
  3. When possible, choose issues from the MQM Core. Using these issues helps ensure compatibility. However, the Core does not cover all cases, including common ones such as checking formatting, because it is focused on text translations.
  4. Consider not just requirements for one set of specifications/parameters, but also for other likely sets. For example, if two types of translations are frequently assessed, it may make sense to develop one list of issues with different sets of weights and to use the single (master) set of issues. This practice is recommended to prevent the need to train evaluators on multiple metrics.

8.3. Holistic metrics

Holistic assessment methods are more flexible in some respects than error-count metrics. They are designed to provide an assessment of the translated text as a whole rather than a detailed accounting of all errors. As analytic assessment can be time consuming and is not needed in all cases (e.g., when the question is whether a text should be accepted or not), holistic methods may be more appropriate in some cases. Most of the MQM issue types can be easily used as either analytic types or holistic types that apply to the text as a whole. For example, the MQM Punctuation issue type can be used by asking assessors using a holistic tool whether the text is punctuated correctly. In this context some issues will be more useful than others. For example, the Pattern problem issue type is unlikely to be useful in most holistic assessments since it generally makes sense only with regard to very specific sections of a text. By contrast, categories like Grammar can more readily be applied to entire texts.

Note that there is no single method for building holistic scores. In a holistic approach specific issues are addressed through qualitative questions that may be assessed via ranking or on a binary- or scalar-value system. For example, a holistic assessment might address the Spelling issue via questions like the following:

Because the scoring for holistic systems is highly dependent on the type of assessment scale used, no specific scoring system is provided here. Users of MQM who wish to implement it in a holistic environment should tie holistic questions to specific MQM issue types and develop appropriate scoring systems. This version of MQM does not define a system for describing holistic scoring systems, although future versions may do so. However, by using the MQM issue types and associating specific holistic questions with them, implementers can make their metrics more transparent and tie them to project parameters in the same way that can be done with error-count metrics.

The following guidelines may assist in designing appropriate holistic assessments and selecting issue types:

10. Mappings of existing metrics to MQM (non-normative)

This section contains informative mappings from existing metrics to MQM. Note that existing metrics are subject to update without notice. These mappings are provided as a courtesy and no guarantee is made of accuracy and completeness. Any implementations based on these mappings should carefully consider the metric to verify the accuracy of mappings.

SAE J2450

The mapping from SAE J2450 is somewhat complex in that the distinction between severity levels is, in part, based on the whether the issue changes the meaning between target and source, meaning that—at least in principle—a minor error in J2450 would correspond to the Fluency branch in MQM and a major error would correspond to the Accuracy branch. Nevertheless, for most purposes, the following mapping should suffice.

SAE J2450 issue typeMQM issue typeNote(s)
Wrong termTerminology
Punctuation errorTypography
Syntactic errorGrammar
Word structure or agreement errorWord form
Miscellaneous errorOther


Note: This mapping covers the TAUS Dynamic Quality Framework’s “analytic” metric only. The mapping here is not endorsed or maintained by TAUS.

DQF issue typeMQM issue typeNote(s)
Level 1Level 2
grammar – syntaxGrammar
punctuationTypographyEquivalence to be confirmed
noncompliance with company style guidesCompany Style
inconsistent with other reference materialInconsistency
inconsistent within textInconsistency
literal translationOverly literal
awkward syntaxGrammar
unidiomatic use of target languageUnidiomatic
toneRegisterUnclear of the intention of the DQF category
ambiguous translationAmbiguity
incorrect interpretation of source text – mistranslationMistranslation
misunderstanding of technical conceptMistranslationMQM does not attempt to distinguish the (mental) cause of a mistranslation, so this category cannot be distinguished in MQM from the previous one
ambiguous translation of a clear source segmentAmbiguityUnclear how this DQF category is distinguished from “ambiguous translation”.
omission (essential element in the source text missing in the translation)Omission
addition (unnecessary elements in the translation not originally present in the source text)Addition
100% match not well translation or not appropriate for contextImproper exact match
untranslated textUntranslated
noncompliance with company terminologyCompany terminology
noncompliance with 3rd party or product/application terminologyNormative terminology
inconsistentMonolingual terminology
Country standardsLocale violation
datesDate format
units of measurementMeasurement format
currencyUse custom type of Locale violation
delimitersNumber format
addressesUse custom type of Locale violation
phone numbersUse custom type of Locale violation
zip codesUse custom type of Locale violation
shortcut keysUse custom type of Locale violation
cultural referencesUse custom type of Locale violation
toneUnclear how this differs from “tone” under “style”. Probably use custom type of Locale violation
formattingLocal formatting
corrupted tagsMarkup
missing variablesOmitted variable
links not workingBroken link/cross-reference
string-length error Length
missing/invisible textOmission
corrupted charactersCharacter encoding or Nonallowed charactersThe precise mapping depends on the specific nature of the problem
inconsistent cross-referencesInconsistent link/cross-reference
functionality errors - translation (meaning) and function don't matchVerityThis type of issue in MQM would preferentially be handled with other categories if the source text was correct and the translation is inaccurate. However, if the source text is incorrect or if the translation seems to be correct but does not apply because of differences between language versions Verity would apply.
functionality errors - broken functionalityCompatibility: FunctionalNormally outside the scope of MQM
Other Other
query implementationNot MQM issue types:
needs further discussion
Client edit

11. Acknowledgements

Portions of this document were developed as part of the Coordination and Support Action “Preparation and Launch of a Large-scale Action for Quality Translation Technology (QTLaunchPad)”, funded by the 7th Framework Programme of the European Commission through the contract 296347.

12. Previous versions (non-normative)

Changes from version 0.1.12

Changes from version 0.1.11

Changes from version 0.1.10

Changes from version 0.1.8

Changes from version 0.1.7

Changes from version 0.1.6

Changes from version 0.1.5

Changes from version 0.1