GET A FREE CONSULTATION OR SAMPLE TO GET YOUR PROJECT GOING.

Yes, I want my consultaion

The Basics of Segmentation in OmegaT

How to ensure correct segmentation in the translation program OmegaTSince OmegaT doesn’t rely on the bilingual files as an intermediary format as other CAT programs do, it doesn’t have the Split segment or Merge segments buttons for splitting or merging segments manually. All of this is done through the segmentation rules. Let’s face it: figuring out how to use these rules isn’t easy. In fact, when we began using OmegaT, we had a few problems with segmentation, too. I remember an urgent team project where a translator erroneously added a segmentation rule that automatically merged all segments ending with “s” with the next segment. And, working under pressure of a tight deadline, we couldn’t even figure out what was causing this! But as we practiced and got more comfortable with them, playing with these rules became easier. I bet you’ll enjoy them, too. I hope this post will help you get started.

Basic concepts

By default, segmentation occurs on a sentence level. I can change this in the Properties dialog. However, paragraph-level segmentation makes sense only with creative texts where you may need to move sentences around within a paragraph.

The segmentation rules are based on the regular expressions. But nothing prevents me from entering the actual text directly into the rules as long as nothing within this text matches any of the regular expressions. For example, I can’t enter a period as is since the period is a regular expression that matches any character. I need to escape it by adding a backslash in front of it: .

Rule priority

The rules apply in the following order:

  1. Country-specific rules (e.g. EN-GB)
  2. Language-specific rules (e.g. EN)
  3. Default rules

The default rules have the lowest priority and apply only after all the other rules. I can add a rule either to the default rules or the language-specific ones depending on whether a rule might be useful for other languages as well. Generally, it makes sense to add more general things to the default rules and specific stuff to the language-specific rules.

The individual rules within a group apply from the top down.

General vs. project-specific rules

The segmentation rules can be general or project-specific. The general ones are those stored in the segmentation.conf file in your user settings folder. Under Windows 7 or 8, its default location is c:UsersusernameAppDataRoamingOmegaT.

You can access these general rules by going to Options => Segmentation even without a project open.

The general rules apply by default unless you choose to make them project-specific. To do so, select Ctrl+E => Segmentation => Make the segmentation rules project-specific. From that point on, any changes you make to the segmentation rules while this project is open will be saved to the segmentation.conf file created in the omegat subfolder of this project. When you open another project, the general rules will apply to that project again unless you already made them project-specific as well.

How to manage different segmentation rules

To illustrate, here’s how we manage our rules:

  1. A project manager opens a project during initial setup and looks for any segmentation problems.
  2. If she decides to make any changes to the rules, she makes them project-specific first. By doing so, she makes sure that everyone who opens this project after her gets exactly the same segmentation.
  3. If those rules can be of value to the future projects, she saves them in a separate location so that we can add these rules to our shared segmentation.conf file that we use across all PCs.
  4. If a translator decides to add anything else, he follows the same process: adds a rule that’s automatically made project-specific (because the PM already made all the segmentation settings project-specific) and saves it for re-use if necessary.

Note: For collaborative translation projects, it’s a good idea to make the segmentation rules project-specific in order to include the segmentation.conf file in the project uploaded to an SVN/GIT server. This will ensure the entire team has the same segmentation initially.

Thank you for reading this article. Stay tuned for the next post where I’ll provide a few examples of adding the custom rules to merge or split segments in OmegaT. In the meantime, feel free to check out the post about getting started with OmegaT in no time.

2 comments

  • Надежда says:

    Роман, а что делать, когда Омега выдает сегменты без текста? Я нажимаю на Ctrl + U, перехожу к следующему сегменту с текстом, но потом, при открытии документа, когда нажимаешь на “Перейти к непереведенному сегменту”, программа подсвечивает такой пустой сегмент с одними тэгами.

    • Здравствуйте, Надежда! Не видя происходящего, сложно понять, в чем именно у вас затруднение. Смотрите, если все сегменты уже переведены, то по нажатии Ctrl+U будет открываться/оставаться открытым один и тот же сегмент, ведь непереведенных сегментов не осталось. Второй вариант: у вас в тексте есть сегменты, в которых только теги, их тоже формально нужно один раз «перевести» (если включен параметр Options => Editing Behaviour => Enable translation to be equal to source). Если же этот параметр выключен, то тогда по нажатии Ctrl+U вы будете регулярно попадать на такие сегменты, потому что они остаются непереведенными с точки зрения OmegaT, хотя вы через них уже проходили. В общем рекомендую сначала включить этот параметр, возможно, именно в нем дело. Если не поможет, то присылайте скриншот: можно выгрузить на сервис типа radikal.ru или в облачное хранилище и вставить в ответ ссылку. С уважением, Роман.

Add comment


About the Author

Roman Mironov
Roman Mironov
CEO & Founder

As the founder of Velior, Roman has had the privilege of being able to turn his passion for languages into a business. He has over 15 years of experience in the translation industry. Roman has helped dozens of clients increase sales by making their products appealing for speakers of other languages.