Setting up a translate5 project

Last updated: 2014 December 05

This page provides an overview of the file format requireqments for translate5 projects and how to set up a project.

CSV files

Data to be used in translate5 can be uploaded in CSV (comma-separated value) format or SDL-XLIFF. For QTLaunchPad we have used only the CSV option.

Simple translation projects

For a basic project with a source and an editable target, the CSV file must be structured with the following “fields”

  • mid. A text ID for the segment pair, not shown to the user.
  • source. The source text. (Note: for German-language installations, this field is generally called quelle; for English-language installations, such as the DFKI Metashare installationsource should be used instead) 
  • target. The target text. (Note: for German-language installations, this field is generally called ziel; for English-language installations, such as the DFKI Metashare installationtarget should be used instead) 

These fields must be identified in the first row of the file by name.

Projects with multiple alternative translations

For multicolumn projects with potentially many editable columns, only the mid and source fields are required while other fields can be given arbitrary names. In this case the field names to be displayed in the user interface are identified in the first row of the file to be imported.

CSV requirements

  • The CSV must conform to the following requirements:
  • All literal double quotes (i.e., ") must be represented as two double quotes ("").
  • All field contents must be surrounded by double quotes on each side of the content.
  • Fields are separated by commas.
  • The file must end with a CR character at the end of the last line
  • The translate5 system will attempt to determine the encoding of the file automatically, but it is strongly recommended to use UTF-8 for all files to prevent problems.

An example of a simple source-target CSV file is the following (with ¶ showing new lines):

"001","Iche esse Pfannkuchen","I eat pancakes"¶
"002","Er war hier gestern","He was here yesterday"¶
"003","Wer reitet so spät durch Nacht und Wind?","Who rides so late through the night and the wind?"¶

A more complex multicolumn file might resemble the following:

"001","Iche esse Pfannkuchen","I eat pancakes","I eat doughnuts"¶
"002","Er war hier gestern","He was here yesterday","Yesterday he was here"¶
"003","Wer reitet so spät durch Nacht und Wind?","Who rides so late through the night and the wind?","Who is it who rides in the dark and the drear?"¶

Note that the file name must end in .CSV.

The content of fields may contain XML issue tags that conform to the output generated by translate5.

XML issue type declaration

The translate5 system allows users to declare arbitrary issue sets. These are created in a simple XML file format. While the issues can have any name, the QTLaunchPad project recommends the use of MQM subsets. The XML file contains a root element that contains one or more elements. The elements can be either empty elements or contain additional nested elements. A sample file that represents the MQM core is provided below:

   <issue type="Accuracy">
      <issue type="Mistranslation">
         <issue type="Terminology"/>
      <issue type="Omission"/>
      <issue type="Addition"/>
      <issue type="Untranslated"/>
   <issue type="Fluency">
      <issue type="Register"/>
      <issue type="Style"/>
      <issue type="Inconsistency"/>
      <issue type="Spelling"/>
      <issue type="Typography"/>
      <issue type="Grammar" />
      <issue type="Locale violation" />
      <issue type="Unintelligible"/>
   <issue type="Verity">
      <issue type="Completeness"/>
      <issue type="Legal requirements"/>
      <issue type="Locale applicability"/>

The file name for this file must be "QM_Subsegment_Issues.xml".

The translate5 package format

The translate5 package is a zip-compressed archive with the following directory structure:

      <one or more CSV files to be annotated>

If no QM_Subsegment_Issues.xml is included, the system will default to the MQM Core set of issues.

Any mistakes in this directory structure will result in an error. Incorrect directory structures are the most common source of import failure in translate5.

If multiple files are included in the proofRead directory they will all be displayed together in the user interface in translate5, but will be displayed as links in the UI that allow the user to jump to the first segment of each file.

Uploading projects

After logging in to the system as an administrator, click on Add task in the Task Overview window:

Doing so will open the Create task dialog box:

In this dialog box you must supply the project name, the source and target languages, and the import file (see above for requirements). All other fields are optional.

Click Add task. The file will upload and the task will be added to the list of tasks. Note that tasks are initially created with no assigned users, which must be assigned separately.

Next: User Management