Since OmegaT doesn’t rely on the bilingual files as an intermediary format as other CAT programs do, it doesn’t have the Split segment or Merge segments buttons for splitting or merging segments manually. All of this is done through the segmentation rules. Let’s face it: figuring out how to use these rules isn’t easy. In fact, when we began using OmegaT, we had a few problems with segmentation, too. I remember an urgent team project where a translator erroneously added a segmentation rule that automatically merged all segments ending with “s” with the next segment. And, working under pressure of a tight deadline, we couldn’t even figure out what was causing this! But as we practiced and got more comfortable with them, playing with these rules became easier. I bet you’ll enjoy them, too. I hope this post will help you get started.
By default, segmentation occurs on a sentence level. I can change this in the Properties dialog. However, paragraph-level segmentation makes sense only with creative texts where you may need to move sentences around within a paragraph.
The segmentation rules are based on the regular expressions. But nothing prevents me from entering the actual text directly into the rules as long as nothing within this text matches any of the regular expressions. For example, I can’t enter a period as is since the period is a regular expression that matches any character. I need to escape it by adding a backslash in front of it: .
The rules apply in the following order:
- Country-specific rules (e.g. EN-GB)
- Language-specific rules (e.g. EN)
- Default rules
The default rules have the lowest priority and apply only after all the other rules. I can add a rule either to the default rules or the language-specific ones depending on whether a rule might be useful for other languages as well. Generally, it makes sense to add more general things to the default rules and specific stuff to the language-specific rules.
The individual rules within a group apply from the top down.
General vs. project-specific rules
The segmentation rules can be general or project-specific. The general ones are those stored in the segmentation.conf file in your user settings folder. Under Windows 7 or 8, its default location is c:UsersusernameAppDataRoamingOmegaT.
You can access these general rules by going to Options => Segmentation even without a project open.
The general rules apply by default unless you choose to make them project-specific. To do so, select Ctrl+E => Segmentation => Make the segmentation rules project-specific. From that point on, any changes you make to the segmentation rules while this project is open will be saved to the segmentation.conf file created in the omegat subfolder of this project. When you open another project, the general rules will apply to that project again unless you already made them project-specific as well.
How to manage different segmentation rules
To illustrate, here’s how we manage our rules:
- A project manager opens a project during initial setup and looks for any segmentation problems.
- If she decides to make any changes to the rules, she makes them project-specific first. By doing so, she makes sure that everyone who opens this project after her gets exactly the same segmentation.
- If those rules can be of value to the future projects, she saves them in a separate location so that we can add these rules to our shared segmentation.conf file that we use across all PCs.
- If a translator decides to add anything else, he follows the same process: adds a rule that’s automatically made project-specific (because the PM already made all the segmentation settings project-specific) and saves it for re-use if necessary.
Note: For collaborative translation projects, it’s a good idea to make the segmentation rules project-specific in order to include the segmentation.conf file in the project uploaded to an SVN/GIT server. This will ensure the entire team has the same segmentation initially.
Thank you for reading this article. Stay tuned for the next post where I’ll provide a few examples of adding the custom rules to merge or split segments in OmegaT. In the meantime, feel free to check out the post about getting started with OmegaT in no time.