In the article about QA Distiller, I spent a significant amount of time explaining the issues we have with checking terminology. Right now, we are running two QA sessions each time because of the Whole words only option. The first one (without the Whole words only option) is checking against a project glossary with specific terms. It doesn’t generate too many false positives so the Whole words only option is unnecessary. The second one (with the Whole words only option) is checking against a glossary of high-risk general words. This one, however, needs the Whole words only option badly since doing otherwise is a suicide in terms of the false positive rate (e.g. “east” also finds “at least” and “Easter”)).
Closely associated with this issue is the management burden of having to enter all the original word forms into the glossary of high-risk general words. Or else they won’t be checked at all.
This whole article is about a very useful option in Verifika called Whole words shorter than, which, among other things, has the potential to solve these problems to a certain degree. It forces the program to process the words shorter than the specified character limit as whole words. Here is an example from an English to Russian translation:
I have two words in glossary:
HR управление персоналом
If I run a check without the Whole words only option, “HR” will likely generate a few false positives, e.g. “through.” This means I want to check “HR” only as a whole word.
But if I check with the Whole words only option, there’s a huge catch. “Benefit” will not match “benefits.” If I want to check the whole words only and still be able to check all the word forms of “benefit,” I need to add “benefits” as a separate term to the glossary. The glossary becomes more difficult to manage as a result. But this is the price we currently pay for using the Whole words only option.
The first benefit
The first benefit associated with this Whole words shorter than function in Verifika is that it reduces the glossary management burden. It allows me to use the Whole words only option without entering all the different original word forms for the longer terms, which represent a low risk of a false positive result and can therefore be checked not as whole words. It’s only with the shorter terms (up to the character limit specified) that I need to add all the different word forms. Even though Verifika doesn’t eliminate the problem fully since the shorter terms may still account for a significant portion of a glossary, I think this option is a huge break.
The second benefit
The second benefit is a possibility—although a remote one—to arrange checking against the two glossaries in a single session rather instead of two. Let’s imagine that I’m running a check against both glossaries with the Whole words shorter than 5 characters option. Because Verifika is able to treat the words below the 5 character limit as whole words, the false positive rate drops significantly, making checking for the high-risk general words bearable (not as easy as with the Whole words only option, of course, but still bearable). At the same time, I’m able to check against the glossary of specific terms almost with the same results as I normally get without the Whole words only option. I say almost because—and this is where the catch is—if I forget to add some of the original word forms of a term that falls below the character limit, Verifika will ignore those word forms. And those things that it ignores might be errors. This means that to run QA in a single session, we first need to add all those original word forms to the glossaries with the actual terms. But we don’t currently add the word forms to those glossaries. And I believe they don’t belong there.
The ideal solution
The ideal solution would, of course, be loading two glossaries and telling Verifika that one requires the Whole words only option and the other one doesn’t. Even though the Whole words shorter than option does have some value in this respect, it doesn’t seem the right solution for us. It’s a big step towards a more effective checking for us anyway, at least because it gives us more choices for optimizing the QA sessions. Bottom line: time savings and potential for coming up with a better way of checking against the glossaries.
Thank you reading this post and stay tuned for the next article in this series that compares QAD with Verifika.
So how are you checking the translations against a glossary of specific terms? If you also check against a glossary of high-risk general words, how do you marry the two checks?