GET A FREE CONSULTATION OR SAMPLE TO GET YOUR PROJECT GOING.

Yes, I want my consultaion

QA Distiller: Hopes and Dreams

Quality assurance with QA DistillerIn my previous post about the QA Distiller’s strengths, I began a series of articles comparing this program that we currently use for translation quality assurance to other alternatives. Today, my goal is to discuss some of the things about QAD that I’d like to change from the standpoint of our translation workflow. These are not exactly problems. They’re more like feature requests resulting from the more or less unique needs of our workflow.

Glossary format

Our glossaries are in tab-delimited TXT format and UTF-8 encoding (created by OmegaT). QAD uses its own format called DICT with pipes as delimiters. It also has an additional two-line header and uses UCS-2 Little Endian encoding. Example:

This is how the beginning of a TXT glossary created by OmegaT looks like in case of an English to Russian translation:

catheter               катетер

This is how the beginning of a DICT file looks like for the same term:

DICTFILE V1.1

EN-US|RU-RU

catheter|катетер

As a result, every single time we run QA, we have to convert our TXTs into DICTs. This is quite ineffective.

Different versions of translations for terms

The DICT files don’t support having several different translations for a term in a single line. Nor do they support the regular expressions that could help address this issue. We end up entering each alternative translation as a separate line in a DICT file. This results in ever-growing glossaries that become difficult to manage with time.

Spell-checking

QAD has no built-in spellchecker, which means spell-checking is always a separate step in a separate program. This is ineffective, too.

Whole words only option

We have to check the translations for terminology errors in two QA runs. The first one is checking against a project glossary with specific terms. The second one is checking against a glossary with general words that represent a high risk of translation errors. Since a project glossary includes mostly specific terms, we can safely run this check without enabling the Whole words only checkbox. Such terms normally don’t generate too many false positives because they rarely occur as a part of a completely different word. Example:

Source term: catheter

Translated term: катетер

What it matches: катетер, катетеры, катетеризация

QAD’s judgment: Correct

Even though this term did occur as a part of two other words, it’s fine since all these occurrences actually require checking. We need to have one “катетер…” in the translation for every “catheter…” in the source. If there’s none, this could be an omission.

Only rarely does QAD find a false positive for a specific term. Example:

Source term: hose

Translated term: шланг

What it matches: whose (The translation is obviously different from the translation of “hose.”)

QAD’s judgment: Incorrect; expected “шланг

Because these false positives are so rare, it’s safe to disable the Whole words only checkbox when checking against a glossary of specific terms. This makes managing a glossary easier by allowing us to keep most source terms in just one, dictionary form.

When this checkbox is enabled, a glossary of high-risk general words, however, generates many false positives. Because it includes a plethora of short words and even abbreviations, they match parts of many other, completely different words. Example:

Source term: east

Translated term: восто

What it matches: feast, Easter, at least (The translations are obviously different from the translation of “east.”)

QAD’s judgment: Incorrect; expected “восто”

These are all false positives from the standpoint of checking whether the word “east” is translated correctly. We don’t need them to appear in the results at all. To do this, we can enable the Whole words only option and do this second check (against a glossary of high-risk general words) as a separate QA run. This is ineffective, but bearable.

But there’s yet another catch. Checking the translation against a glossary of high-risk general words with this option enabled makes QAD automatically ignore all the word forms other than the one in the glossary. Example:

Source term: east

Translated term: вост

Source: I went east.

Translation: Я отправился на восток.

QAD’s judgment: Correct

Source: Eastern people are nice.

Translation: Западные люди приветливые. (Western people are nice)

QAD’s judgment: This occurrence is ignored because as a whole word, “east” doesn’t match “Eastern.

In the example above, QAD ignores an error, which is unacceptable. To avoid this, we need to enable the Whole words only checkbox only after making sure that our glossary of high-risk general words includes all the possible word forms so that QAD does check them instead of ignoring them. Example:

Without the Whole words only option:

Source term: east

Translated term: вост

With the Whole words only option:

Source term: east

Translated term: вост

Source term: eastern

Translated term: вост

So we’re choosing from two options here:

  1. Disable this checkbox and tackle an avalanche of false positives when checking against a glossary of high-risk general words. This has a negative effect on concentration, defeating the very purpose of checking against a glossary, which is to concentrate on checking just those critical words.
  2. Enable this checkbox and manage a very large glossary of high-risk general words where each term has many different forms. Doing this is a pain.

In this “between a rock and a hard place” situation, we went for the second option.

Forgotten translations

QAD allows checking for forgotten translations. The program generates an error whenever a target segment is identical to the source segment. Most of the errors we get are false positive, but since this check is so important, we have to live with this.

Now, we don’t translate certain segments in a project until we get to finalizing it. These segments are internal fuzzy matches. We find it easier to leave them untranslated initially and avoid checking them during editing and having to correct the same error twice. Doing this also reduces the risk of inconsistent translation. Read more about this in the article about collaborative translation projects.

When we leave them untranslated in OmegaT, two things can happen:

The Allow translation to be equal to source checkbox is enabled

All untranslated segments are committed to the project TM. This is quite useful, at least because they appear as translated in the Editor pane and are counted as translated in the statistics. The real untranslatables are also saved to the TM and can be reused in the future projects. But when we feed this project TM to QAD, all of them come up as false positives. This is ineffective, but bearable. But they also come up as terminology errors! QAD obviously can’t find the respective translations in the target since the target is equal to the source. Example:

Source term: anchor

Translated term: якор

Source: For scrubbing, protect the head with a piece of soft material and anchor it in the drain hole of the sink.

Target: For scrubbing, protect the head with a piece of soft material and anchor it in the drain hole of the sink.

QAD’s judgment: Dictionary term anchor was found in the source, but there is no target match.

The Allow translation to be equal to source checkbox is disabled

The problem described above goes away because the untranslated segments aren’t committed to the TM at all. But at the same time, we lose the advantages associated with committing the untranslated segments to the TM.

An alternative to this is to run QA only once, on a finalized file. But it doesn’t make sense in terms of our translation workflow.

Yet, another option is to translate a project with this checkbox disabled and then enable it before finalizing the project. It’s a good idea, but still not very effective since I need to switch between projects all the time. I’ll have to check the status of this option whenever I open a project.

What we ideally need is an option to skip the untranslated segments for all types of checks, in particular terminology.

X-Editor

Whenever I need to make a correction, I can double-click, or press Enter on, an error to open the translation unit in X-Editor. After I make the changes to the translation in X-Editor and press Ctrl+S, the bilingual file is updated with those changes. That’s quite convenient. What’s inconvenient is that X-Editor isn’t a fully functional editor yet. It doesn’t allow moving around with Ctrl + arrow keys. It doesn’t allow selecting words using Ctrl + Shift + arrow keys. It’s slow to react to switching the keyboard layout a couple of first times. These things aren’t a problem unless I need to correct a lot of translation units. Using a mouse to navigate within this window can take too much time.

Conclusion

All of these areas for improvement aren’t significant if you look at them individually. But since we spend hours on QA each day, improvement in all of these areas combined could make a big difference. What we need is a program that combines the QAD’s strengths with a set of other advanced and configurable features.

If you found this article about translation quality assurance useful or have any ideas with regard to the points I made, please tell us about this by leaving a comment or subscribing to our RSS feed.

Add comment


About the Author

Roman Mironov
Roman Mironov
CEO & Founder

As the founder of Velior, Roman has had the privilege of being able to turn his passion for languages into a business. He has over 15 years of experience in the translation industry. Roman has helped dozens of clients increase sales by making their products appealing for speakers of other languages.