[QTLaunchPad logo]

MT Test Suite

Contributors: Aljoscha Burchardt (DFKI), Kim Harris (text&form), Arle Lommel (DFKI), Katrin Marheinecke (text&form),
Maja Popović (DFKI), Thomas Senf (text&form), Nicole Tielker (DFKI), Hans Uszkoreit (DFKI)


This resource contains a table of machine translated segments that show errors. The purpose of this resource is to provide a set of segments that show typical MT issues/errors so that MT developers can compare the performance of their systems to see how well they perform with the same input.

It consists of two parts:

Each test suite contains two sorts of segments:

  1. Segments from QTLaunchPad corpora or other resources. These segments were selected from annotated QTLaunchPad corpora to illustrate particular issue types. In addition, segments from various other sources were selected to demonstrate common problems.
  2. Segments from the TSNLP Grammar Test Suite for English. To prepare this corpus all of the “grammatical” segments from TSNLP for the appropriate source language were reviewed. As TSNLP was not designed for use in MT testing, but rather to provide challenging cases for grammar checkers, a team of two native-speaker linguists evaluated all segments for each language and only those segments that both reviewers agreed were truly grammatical were used. In addition, sentence fragments were removed since isolated sentence fragments pose particular problems even for human translators. The resulting set of sentences was then translated using four leading commercial MT systems (two SMT and two RbMT) and sentences that proved problematic for both systems of a given MT type were classified as exhibiting a barrier for that system type.

For the TSNLP data, one MT result was selected from among those systems that were considered to exhibit barriers. This segment was the one judged by the group of linguists to come the closest to “getting it right”. For the corpus data, the translation in the corpus was used. In both cases the translation was annotated using MQM to identify issues and post-edited to show one possible way to resolve the issues. (Note that the post-editing was intended to be minimal, with only enough changes to make the sentence grammatical and acceptable. Full post-editing in many cases would result in more substantive changes in sentence structure, but the goal was not to create a stylistically perfect text.)

For the corpus data, no information is provided as to which system type translated the segment, for which system type(s) the segments proved to be a barrier, or the TSNLP class.

Data structure

Data in the test suite files is in the following columns:

  • ID. A unique ID value that can be used to identify an item in the test suite. For “corpus” segments, the ID value begins with the letter “c”; for items from the TSNLP suite, they begin with g and are followed by the ID value from the TSNLP suite.
  • Source. The source segment
  • Barrier for. Which system type(s) for which the segment is a barrier
  • Trans. by. Which kind of system translated the segment that was annotated and post-edited (SMT = a statistical system, RbMT = a rule-based system, “—” = unknown for cases of “client” data where the engine was not indicated).
  • Annotated target. The target segment with markers for the annotated spans. (Note that MQM allows for overlapping, i.e., non-nested annotations, but all annotations in this test suite are strictly nested.)
  • Issues. A numbered list of issue types, corresponding to the marked spans in the Annotated target column.
  • Post-edited target. Contains a post-edited version of the target text. Specific changes from the original target are not marked, but can be easily determined by comparison of the target and post-edited target
  • Notes. Human-readable notes concerning the item.
  • TSNLP class (for TSNLP items only). The TSNLP class for the source segment.

MQM annotation

The data were annotated primarily using the same set of MQM issues used for the second round of the QTLaunchPad MQM Annotated Corpora. The list of issues and guidelines for annotators are available at http://qt21.eu/downloads/annotatorsGuidelines-2014-06-11.pdf. In addition, examples of selected additional MQM issue types were added. For the full list of issues found, please see the filter settings for the specific test suites.


The data sets provide options to filter data. It is possible to select whether to see corpus data, TSNLP data, or both. In addition, it is possible to conduct a full-text search of the annotated segments (search addresses both source and target texts, as well as postedited segments) and to filter results by the combination of MQM issues annotated in their content. Clicking on a row header also allows the currently visible results to be sorted (e.g., to sort by the type of system for which the content is a barrier).


NOTE: These resources will be updated from time to time. The date of the latest update can be found in each data set.

The following changes were made on the dates listed

  • 2014-12-03. Corrected some errors in descriptions and added links to annotation guidelines. No data updated.

This resource was prepared as part of the Coordination and Support Action “Preparation and Launch of a Large-scale Action for Quality Translation Technology (QTLaunchPad)” (Deliverable 1.4.1. QT Test Suite). This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no. 296347.

Creative Commons License
by QTLaunchPad is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.