Yes, I want my consultation

Three Indispensable Regular Expressions for OmegaT

Learning how to use regular expressions in OmegaT pays offOne of the best things about OmegaT… Wait, do I say this too often? 🙂

Anyway, OmegaT supports regular expressions for many tasks, enabling us users to do extremely useful things. I already wrote about using regular expressions to build a list of common errors. Other areas where they come in handy include segmentation and text search. Read this article to learn about some of the most important REs that make it so much easier and more satisfying to work on the translation in OmegaT.

Searching for whole words

OmegaT comes with two major search options: exact search and keywords:

  • Exact search yields the exact match of what you’re searching for:

open box” results in “open box,” “open boxes,” “re-open boxes,” and so on, but not “opened box.”

  • To find “opened box,” you need to search by keywords, and OmegaT will be looking for any number of individual search terms in any order.

“open box” results in “opened box,” “An opened box fell to the floor,” and “The mailbox was left opened.”

This second option is very useful, by the way. And it’s pretty much unique to OmegaT since it isn’t available in many commercial counterparts.

Now, what if you want to find just “open box,” without any inflections? You need the whole word option. While this option is standard in many applications, OmegaT doesn’t include it yet. No problem, just put b, denoting a word boundary, right before and after your search term. Enable the Regular expressions checkbox and go ahead.

Like this: bopen boxb

Finding the untranslated segments

You can open the next untranslated segment by jumping to it with Ctrl+U. But what if you want to see a picture bigger than just one segment at a time? You can do so with this simple regular expression: ^$. This is what you can use its results for:

  1. Double-checking whether every segment requiring translation has been translated.
  2. Filtering out the already translated segments to display the untranslated segments only in the Editor pane. This “uncluttered” view can be very conducive to concentrated and productive work.
  3. Extracting the untranslated segments into a separate file for further use. For example, you may want to create such file to avoid processing the 100% matches that your client isn’t paying for. If you keep them in the project, they might distract you and will actually appear in the quality assurance results when you run QA in a separate program such as Verifika.

Matching any single character

I didn’t realize until recently how easy it was to create segmentation rules in OmegaT using such a simple regular expression as a period that matches any character. Wherever you want a segment to break you just add this item it as a “pattern before,” with a period as a “pattern after,” or vice versa. Here is an example:

Open the Settings window.nnOpen the Files tab.

This kind of segments can be a pain to manage in a translation project because while the two sentences are glued together in one segment here, they can also occur in the project as two separate sentences:

Open the Settings window.

Open the Files tab.

As a result, they might be translated inconsistently.

But this is where the mighty period comes into play. Just add these two segmentation rule:

(Break/Exception enabled)

Pattern Before: \n\n (you need to escape both backslashes because otherwise n will be treated as a regular expression for the newline)

Pattern After: .

And the second one, mirroring the first one:

(Break/Exception enabled)

Pattern Before: .

Pattern After: \n\n

That’s it. You’ll produce two much cleaner segments, and sometimes, they’ll even turn out to be repetitions identical to some other sentences in your project:

Open the Settings window.


Open the Files tab.

Most of the suggestions above originally came from the extremely helpful community at the OmegaT’s forum. Once again, kudos to the OmegaT’s “ecosystem”!

Add comment

About the Author

Roman Mironov
Roman Mironov
CEO & Founder

As the founder of Velior, Roman has had the privilege of being able to turn his passion for languages into a business. He has over 15 years of experience in the translation industry. Roman has helped dozens of clients increase sales by making their products appealing for speakers of other languages.