Exporting

Last updated: 05 December 2014

Previous: Logging in and annotating

Exporting a task

When a task is complete (or at any point when the current status of the task must be saved) an admin user may export it. Tasks are exported in the original format they were uploaded in (SDL-XLIFF or .CSV). To export the task, click on the Export options icon in the Task Overview screen and select an appropriate option:

There are at least three options available for export:

  • Export. This option exports the entire file in its current state, including any edits or annotations. It does not mark changes. This is the option generally used in QTLaunchPad projects.
  • Export with alteration record. This option exports the entire file but marks changes to the segments using <ins> and <del> tags to mark changed portions. This option is useful in post-editing tasks to see what exactly the user changed. However, for pure annotation tasks, it may complicate calculations concerning the amount of text marked for various issues.
  • Export QM-Statistics (XML) for field: <field name>. If there is more than one editable target column, this option will appear for each column. This option exports an XML file containing aggregate details on all issues annotated in the named column. This XML file includes the number of issues for each type annotated (it does not, however, include any issue types that were not actually selected for that file) and their severity. This information can be used to compare the issue profiles of different texts.

Export format

Projects are exported in the original CSV format, as shown in this example:

"derstandart_at/2012/12/01/141907-2_de_SMT-Gold","""Ich finde das Gebäude lustig, es sieht futuristisch aus, und endlich gibt es wieder etwas
Interessantes zu sehen"", sagt Lisette Verhaig, eine Passantin am Straßenrand.","""I find <mqm:startIssue type=""Mistranslation"" severity=""critical"" note="""" agent=""annotator1"" id=""35471""/>it <mqm:endIssue idref=""35471""/> funny, it looks futuristic, and finally there is something interesting to see again,"" said Lisette Verhaig, a <mqm:startIssue type=""Mistranslation"" severity=""critical"" note="""" agent="" annotator1"" id=""35472""/>technician <mqm:endIssue idref=""35472""/>at the roadside."

 

In this example, two issues were annotated in the target text. The start of each issue is indi-cated by a <mqm:startIssue/> tag and the end by a <mqm:endIssue/> tag. The start and end are matched by their respective id and idref values. The <mqm:startIssue/> tag contains the following attributes in addition to id:

  1. type. The MQM issue type for the annotation.
  2. severity. The severity level for the annotation (critical, major, minor)
  3. note. Any note text associated with the annotation. (Note that segment-level notes are currently not exported.)
  4. agent. The user name of the annotator who added the annotation.

Converting exported files to tab-delimited

For many calculation purposes in QTLaunchPad, tab-delimited files are preferable to CSV. To convert CSV to tab-delimited, open the file in a regular-expression capable editor and run the following search and replace actions:

Replace stringExplanation"Replaces escaped double quotes with single quotes\tReplaces comma field boundaries with tabs\1Some editors will require this to appear as $1 instead. Removes field delimiters at the start and end of each line

Search string Replace string Explanation
"" " Replaces escaped double quotes with single quotes
"," \t Replaces comma field boundaries with tabs
"(.+)" \1 Some editors will require this to appear as $1 instead. Removes field delimiters at the start and end of each line

Post-processing output files to remove unneeded markup

It may also be desirable to remove some of the attributes present in the file to simplify the markup. Individual attributes can be removed from the entire file with the following regular expressions:

Search string Replace string Explanation
agent="+.*?"\s? <empty string> Removes annotator IDs
severity="+.*?"\s? <empty string> Removed severity levels
agent=""?[^"]+"\s? <empty string> Removes any empty comment fields (leaving all “real” comments behind)
agent=".*?"+\s? <empty string> Removes all comment fields

 

Note: The above regular expressions assume that the output has been converted to tab-delimited format already. If CSV is used, all quote marks in the regular expressions will need to be doubled.