QT21 – Quality Translation 21

Video


Introduction

QT21: Quality Translation 21


Quality Translation 21 is a machine translation project which has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 645452.


A European Digital Single Market free of barriers, including language barriers, is a stated EU objective to be achieved by 2020. The findings of the META-NET Language White Papers show that currently only 3 of the EU-27 languages enjoy moderate to good support by our machine translation technologies, with either weak (at best fragmentary) or no support for the vast majority of the EU-27 languages. This lack is a key obstacle impeding the free flow of people, information and trade in the European Digital Single Market.


Many of the languages not supported by our current technologies show common traits: they are morphologically complex, with free and diverse word order. Often there are not enough training resources and/or processing tools. Together this results in drastic drops in translation quality. The combined challenges of linguistic phenomena and resource scenarios have created a large and under-explored grey area in the language technology map of European languages. Combining support from key stakeholders, QT21 addresses this grey area developing


  • substantially improved statistical and machine-learning based translation models for challenging languages and resource scenarios,

  • improved evaluation and continuous learning from mistakes, guided by a systematic analysis of quality barriers, informed by human translators,

  • all with a strong focus on scalability, to ensure that learning and decoding with these models is efficient and that reliance on data (annotated or not) is minimised.

To continuously measure progress, and to provide a platform for sharing and collaboration (QT21 internally and beyond), the project revolves around a series of Shared Tasks, for maximum impact co-organised with the annual workshops on machine translation (WMT).


Duration: 1st February 2015 - 31st January 2018
Coordinator: Prof. Dr. Josef van Genabith

Achievements

Main Objectives


The scientific work performed by QT21 focused on challenging morphologically complex and syntactically varied languages. The research has been organised along three axes:

  1. semantics (WP1),

  2. morphology and low resource languages (WP2), and

  3. continuous learning from mistakes (WP3).

In order to continuously measure project progress and compare with the international state-of-the-art, QT21 co-organised and sponsored WMT in 2016-2018 (Workshop on Machine Translation http://statmt.org/wmt for 16, 17 and 18) to benchmark and measure Machine Translation technologies (MT) on different tasks (WP4).<\p>
QT21 focused on the following technical objectives:


  1. To substantially improve statistical and machine-learning based translation models for challenging languages and resource scenarios;

  2. To improve evaluation and continuous learning from mistakes, guided by a systematic analysis of quality barriers, informed by human translators;

  3. To ensure that learning and decoding with these models is efficient and that reliance on data (annotated or not) is minimised;

  4. To continuously measure progress, and to provide a platform for sharing and collaboration (QT21 internally and beyond), the project revolves around a series of Shared Tasks, for maximum impact co-organised with WMT;

  5. To support early technology transfer, QT21 has implemented a Technology Bridge linking ICT-17(a) and (b) projects and opening up the possibility of showing technical feasibility of early research outputs in near operational industry-focused environments.

The project works on 5 language pairs, 4 having English as source language (English->German, English->Czech, English->Latvian, English->Romanian) and 1 having English as target language (German->English).




Main Results Achieved


Objective (1): QT21 has made substantial contributions to the paradigm shift introduced by Neural Machine Translation (NMT), significantly improving the state-of-the-art (SOTA). Core technical contributions include “back translation” that allows to synthetically augment the training data volume, Byte Pair Encoding (BPE) to compress vocabularies of Morphologically Rich Languages (MRL), layer normalisation and deeper gated recurrent neural networks (GRU). The first two ones proved essential at WMT16, the last two ones at WMT17 where QT21 systems won more than 80% of all WMT16+WMT17 shared tasks, outperforming the well-known online large scale commercial MT systems on En↔De, En↔Cz and En→Ro, the core languages of QT21.


Objective (2): QT21 submissions won the WMT16 Quality Estimation (QE) Task 3 on “predicting document level quality” (and scored 3rd from 13 submissions on Task 1 at sentence level). QT21 systems also won all WMT16 Metrics tasks. In the Automatic Post Editing (APE), which aims at “learning from mistakes” (learning from post editions from professional translators) QT21 improved at WMT16 the baseline by 2.64 BLEU points with the 2nd best performance and won the WMT17 task improving the baseline by 7.6 BLEU points. Further, a QT21 online APE system that interacts with human post-editors has been developed within a continuous learning scenario improving MT over current results by 1 to 2 BLEU points. Further, Direct Assessment, a new method involving crowd sourcing in an effective way has been developed that allows for a more reliable evaluation of MT systems.


Objective (3): QT21 introduced back-translation (see Objective (1)), effectively reducing the dependency towards bi-lingual data. QT21 used BPE (see Objective (1)) addressing the important out-of-vocabulary (OOV) issue in automatic translation of MRLs. QT21 showed also that multi-lingual embeddings can efficiently support transfer learning from a well-resourced language to under-resourced languages. Further, QT21 work on inter-lingual factors opened the door to translating languages not seeing during training.


Objective (4): The organisation of WMT (co-organised with CRACKER-Horizon2020 # 645357) was at the core of this objective. The +48% increased number of submissions from 2015 to 2017 on the main task (News Task) and the tripling of participation in the APE task between 2015 and 2017 demonstrates the value and recognition WMT enjoys in the community.


Objective (5): QT21 has conducted technology focused workshops on QT21 research results and technologies with the DGT (MT@EC), with HimL (Horizon2020-ICT17b #644402), MMT (Horizon2020-ICT17b #645487), TraMOOC (Horizon-ICT17b #644333), KConnect (Horizon2020-ICT15 #644753). With the help of QT21, DGT (MT@EC) switched for certain language pairs from SMT to NMT. QT21 had a joint QT21-HimL submission to WMT16. TraMOOC (Horizon # 644333) decided to use NMT from QT21. MMT has successfully tested the online APE framework provided by QT21 which improves their baseline in a real MT scenario.



Dissemination and communication of results


The project has published 207 different scientific papers from which 88 are first-tier (reviewed) international conference papers, 16 journals and 24 system papers, the rest (79) being papers published at second tier conferences and workshops.


Our PIs have been invited to deliver 32 scientific keynotes and tutorials. In addition, they have been invited 28 times by companies and other non-scientific and public organisations (e.g. chamber of commerce) to present about the state-of-the-art in Machine Translation (MT).


QT21's knowledge transfer to the industry has been assured though 19 talks at diverse industry conferences including LocWorld, Tekom and GALA and more technically through 21 industry related workshops. QT21 produced also 9 hours of webinar content, attended by 718 and viewed by more than 1000 individuals.


QT21 has harmonised the two major frameworks for MT error analysis, QT21’s own MQM (Multidimensional Quality Metrics) and TAUS’ industry standard DQF (Dynamic Quality Framework) implemented in most localisation tools. An industry user group of 86 companies (e.g. Adidas, Adobe, Google, Microsoft, SAP, Siemens, WeLocalise) now conveys bimonthly to discuss DQF-MQM related issues.


The results of QT21 are documented in 64 deliverables.