Another Improvement in Translation Process Made Possible by OmegaT

5

How to Reduce Manual Translation Work in OmegaT
Recently, I wrote about how Kos Ivantsov’s scripts help make better use of OmegaT. Let’s look at one specific example now. We had a job recently that in addition to new text, had some 40,000 full (100%) matches that we agreed with the client not to review. This job included DTP and post-DTP review. In the course of post-DTP review, we noticed that the 100% matches we had not touched did not have non-breaking spaces after numbers. This resulted in multiple cases of lines ending with a number, while the next word, such as a measurement unit, ended up on the next line. This is incorrect. Example:

За прошедшую неделю программу с нашего сайта загрузили 157

раз. Это стало для нас рекордом.

We spent several hours fixing this problem, and this was for unpaid 100% matches. We definitely needed a better way to deal with this in the future, and one of the Kos’s scripts provided the solution.

Search_replace_batch.groovy

OmegaT does not have a built-in search and replace functionality yet—a frequent valid reason for criticism—but you can use this script as a workaround. Unfortunately, just like most other scripts in OmegaT, it is not as straightforward as searching and replacing through a window like in Microsoft Word. Yet it does allow you to run multiple replacements, fully unmanned. Automatic replacement requires exercising caution, of course, but the ability itself is terrific. Especially because you can create a list of regular replacements that you will apply across projects, saving the time that it normally takes to do this manually.

Search string

The first step is figuring out what to search for and what to replace it with. With straightforward replacements, you can use literal strings. For example, I can use the script to change “Вашим” to “вашим” if I do not want to have this word capitalized in English to Russian translation. But with my task of replacing regular spaces with non-breaking ones, I have to use regular expressions to cover all the possibilities:

What to search for

157 раз

50 ТБ

25 000 рублей

Regular expression

(/d)/s([А-я]) $1 $2

Where:

What to search:

(/d) matches any number.

/s matches a regular space.

([А-я]) matches any Russian letter.

What to replace with:

$1 inserts the number matched by (/d).

Then goes the non-breaking space.

$2 inserts the letter matched by ([А-я]).

Using the script

  1. You need to download the script and save it into your scripts folder within OmegaT program folder. If you are not comfortable with that, simply download OmegaT that I fully configured for you with all scripts and settings.
  2. Create a text file called search_replace.ini in the root folder of your OmegaT project. You use this file to tell the script what replacements to make.
  3. Put the above search string on the first line: (/d)/s([А-я]) $1 $2.
  4. Leave the second line empty. Remember that this last line in a file must always remain empty.
  5. In OmegaT, select Tools => Scripting, click the script in the left-hand panel, and click Run.
  6. The script will make the replacements and display the result window, allowing you to check the results.

Benefitting from the script across projects

By doing this replacement before the unfortunate DTP job I mentioned, we could save a few hours. And this is what we are going to do in the future—build a list of replacements that will reduce time-consuming manual work, such as adding non-breaking spaces or replacing quotation marks, as well as eliminate the risk of such problems. We will store it in a central location, copy it into each project, and run it towards the end of the project. I think this is a good idea for every OmegaT user.

From now on, the OmegaT version we provide as a download will include the recent version of our script. The script is designed for Russian translation and includes explanations of the replacements, so you can remove those that you do not need in a particular project.

Kos, you are a genius. Make sure to read more about what this talented individual has been up to lately.

Now tell me: Is there a CAT tool, including the commercial ones, that can do this as effectively as OmegaT?

5 comments

  • Hi, Roman,

    Actually, there is indeed a tool that can perform this job much more efficiently. The tool is called Non-breaking Space Checker and it is part of free TransTools for Word add-in. If you deal with a native Word document and not a bilingual file, it can be used to add non-breaking spaces where appropriate.

    Here’s brief information about this tool from a post on Yahoo memoQ group:

    Many modern style guides require that non-breaking spaces (also called hard spaces) are used instead of regular spaces in a number of cases to improve readability and avoid confusion. Here are examples of the most common contexts:
    1) Between numbers and the following units of measurements, e.g., “5 m2”, “220 V”, etc.
    2) In certain composite names that should occur on the same line of text, e.g., “Microsoft Word”, “The New York Times”, etc.
    3) Before numbers that define the preceding word, e.g., “ISO 9001”, “page 17”, “Table 1”, etc.
    4) Inside guillemets in French (e.g., « Quand on est journaliste, la vérité est un devoir »).
    5) Before two-part punctuation in French (e.g., « C’est la vie ! »).
    6) As a thousand separator (digit grouping symbol) in numbers in certain languages, e.g. “45 000 Euro”.
    A regular space used in the above examples can cause the text to wrap to a second line, which may not look very professional and, in some cases, can cause confusion.

    Non-breaking Space Checker is a special tool for Microsoft Word that automates insertion of non-breaking spaces in all of the above cases. It takes only 1 or 2 minutes to check and correct a medium-size document. You can freely modify the lists of phrases and words (see cases 1 to 3 above), which are maintained independently for each target language, and configure other options.

    You can use Non-breaking Space Checker on cleaned documents exported from your CAT tool, as part of the final step of your quality assurance workflow. It can also be used to polish source documents. Unfortunately, it does not work on bilingual documents, such as bilingual RTF or delimited Trados DOC files.

    Non-breaking Space Checker is part of TransTools for Word, a free Microsoft Word plug-in. For more information, go to http://www.translatortools.net/word-check-nbsp.html

    Unlike your solution, the tool works on the basis of lists of phrases (e.g., it would be “руб*” for “рублей”, “рубля”, etc). The lists take some time to set up, but once they are set up, they can be used routinely on all exported Word documents.

    You can download the tool from http://www.translatortools.net/download.html .

    By the way, TransTools also includes TransTools for Visio and TransTools for Autocad, also free, which can help your agency translate Visio drawings (both VSD and VDX varieties) and Autocad drawings.

    Best regards,
    Stanislav Okhvat
    Translator Tools – Useful tools for every translator
    http://www.translatortools.net

    • Hello Stanislav,
      Thank you for the detailed information.
      Is Non-breaking Space Checker limited to Microsoft Word files? Or can I use it with any other format, be it IDML, HTML, or XLSX? With the script I wrote about all formats are supported, because it does not even care about the format.
      Best regards,
      Roman

  • Hello, Roman,

    Yes, Non-breaking Space Checker can only work with Word documents since this command is part of a Microsoft Word add-in. Due to some Word limitations, I could not make it work with bilingual Word documents (e.g., Trados delimited documents or memoQ bilingual RTF files).

    I understand that your script does not care about the format as long as a file can be opened by OmegaT. At the same time, Non-breaking Space Checker can be used for several other scenarios, not just “units of measurement”. I am also planning to enhance it with processing of special words before titlecase words, e.g. river Thames, Mr. Jones, etc.

    Integrating Non-breaking Space Checker with OmegaT is tempting, but it’s a lot of work because the tool is quite complex. The learning curve is a bit steep, too, because I currently use VBA (Office) and C#.

    Best regards,
    Stanislav

  • Damien Rembert says:

    Hi Roman,
    I think the regular expressions you quoted needs to be put in a code block (or escaped) as the backslashes seem to be missing (“d” and “s” should follow backslashes unless I’m mistaken).
    Thanks for another very interesting article!
    Damien

    • Hi Damien,

      Thank you for reading and commenting. Yes, WordPress seems to remove these backslashes randomly. I have fixed this.

      Best wishes,
      Roman

Add comment