Exporting
Last updated: 05 December 2014
Previous: Logging in and annotating
Exporting a task
When a task is complete (or at any point when the current status of the task must be saved) an admin user may export it. Tasks are exported in the original format they were uploaded in (SDL-XLIFF or .CSV). To export the task, click on the Export options icon in the Task Overview screen and select an appropriate option:
There are at least three options available for export:
- Export. This option exports the entire file in its current state, including any edits or annotations. It does not mark changes. This is the option generally used in QTLaunchPad projects.
- Export with alteration record. This option exports the entire file but marks changes to the segments using <ins> and <del> tags to mark changed portions. This option is useful in post-editing tasks to see what exactly the user changed. However, for pure annotation tasks, it may complicate calculations concerning the amount of text marked for various issues.
- Export QM-Statistics (XML) for field: <field name>. If there is more than one editable target column, this option will appear for each column. This option exports an XML file containing aggregate details on all issues annotated in the named column. This XML file includes the number of issues for each type annotated (it does not, however, include any issue types that were not actually selected for that file) and their severity. This information can be used to compare the issue profiles of different texts.
Export format
Projects are exported in the original CSV format, as shown in this example:
"derstandart_at/2012/12/01/141907-2_de_SMT-Gold","""Ich finde das Gebäude lustig, es sieht futuristisch aus, und endlich gibt es wieder etwas Interessantes zu sehen"", sagt Lisette Verhaig, eine Passantin am Straßenrand.","""I find <mqm:startIssue type=""Mistranslation"" severity=""critical"" note="""" agent=""annotator1"" id=""35471""/>it <mqm:endIssue idref=""35471""/> funny, it looks futuristic, and finally there is something interesting to see again,"" said Lisette Verhaig, a <mqm:startIssue type=""Mistranslation"" severity=""critical"" note="""" agent="" annotator1"" id=""35472""/>technician <mqm:endIssue idref=""35472""/>at the roadside." |
In this example, two issues were annotated in the target text. The start of each issue is indi-cated by a <mqm:startIssue/> tag and the end by a <mqm:endIssue/> tag. The start and end are matched by their respective id and idref values. The <mqm:startIssue/> tag contains the following attributes in addition to id:
- type. The MQM issue type for the annotation.
- severity. The severity level for the annotation (critical, major, minor)
- note. Any note text associated with the annotation. (Note that segment-level notes are currently not exported.)
- agent. The user name of the annotator who added the annotation.
Converting exported files to tab-delimited
For many calculation purposes in QTLaunchPad, tab-delimited files are preferable to CSV. To convert CSV to tab-delimited, open the file in a regular-expression capable editor and run the following search and replace actions:
Replace stringExplanation"Replaces escaped double quotes with single quotes\tReplaces comma field boundaries with tabs\1Some editors will require this to appear as $1 instead. Removes field delimiters at the start and end of each line
Search string | Replace string | Explanation |
---|---|---|
"" | " | Replaces escaped double quotes with single quotes |
"," | \t | Replaces comma field boundaries with tabs |
"(.+)" | \1 | Some editors will require this to appear as $1 instead. Removes field delimiters at the start and end of each line |
Post-processing output files to remove unneeded markup
It may also be desirable to remove some of the attributes present in the file to simplify the markup. Individual attributes can be removed from the entire file with the following regular expressions:
Search string | Replace string | Explanation |
---|---|---|
agent="+.*?"\s? | <empty string> | Removes annotator IDs |
severity="+.*?"\s? | <empty string> | Removed severity levels |
agent=""?[^"]+"\s? | <empty string> | Removes any empty comment fields (leaving all “real” comments behind) |
agent=".*?"+\s? | <empty string> | Removes all comment fields |
Note: The above regular expressions assume that the output has been converted to tab-delimited format already. If CSV is used, all quote marks in the regular expressions will need to be doubled.