Video: Splitting Segments in OmegaT
June 11th, 2013, Roman Mironov
Feeling frustrated over a bunch of sentences packed into one segment? Want to break segments to get more repetitions?
Watch this video to learn how to produce clean segments in OmegaT:
Many OmegaT users admit that the current approach to segmentation in the program isn’t exactly straightforward. I agree. It’s not rocket science either, however. Indeed, to build smart and long-lasting rules, you need to understand regular expressions, which might be a challenge. But if you just want a quick fix for your current segment, knowing regular expressions is often unnecessary.
- So let’s look at splitting segments with this kind of a “quick-fix” rule. Sometimes, a translator gets two sentences merged into one. They require splitting. In this example, the pattern joining the two segments is a tag between them. This tag is what I’m going to use as the basis for my rule. I will start by creating a “quick-fix” rule that is easy and belongs to the set of rules that takes the highest priority. After I open the Segmentation Setup window, I can select the English set of rules (since my source language is English) and add my “quick-fix” rule into this set. But I can also click Add and create a new set of rules that I will name “Quick-Fix Rules.” For the language pattern, I will enter “EN-US.” By doing so, I make sure that the rules in this new set take priority over all the rules both in the English set and the Default set. None of the rules in those other sets can interfere with my “quick-fix” rules.
- I am ready to add my new rule now. I click Add in the section below. After a blank rule appears, I enter the tag as the “Pattern Before” and the text that follows it as the “Pattern After.” Note how I am enabling the checkbox this time. This means that I want to break a sentence at the borderline. As soon as I reload the project, I get the correct segmentation.
- Now, what I did isn’t the most efficient approach, but that isn’t the point with any “quick-fix” rule. The point is that it’s easy and it works. Its main drawback is that if I have a lot of similar segmentation issues in a project, I’ll have to make a rule for each instance.
- Let’s take this a little bit further by making a more universal rule. I have a very similar instance, where a tag prevents correct segmentation. But the rule that I added doesn’t work, of course, because it expects only “Replace” as the “Pattern Before,” but not “Open.” To make sure that my rule applies in both cases, whatever the text following this tag is, I can use a powerful regular expression, the period, that represents any character. I will add it as the “Pattern After.” The segmentation is correct now. The period matches both “Replace” and “Open.”
- But wait, there is still more. To produce the cleanest possible segment in this case, I can also add a rule that will remove the tag out of this segment entirely. This second rule will simply mirror the first one, because it will do the same thing, but from the other side of the tag. It will break a segment before the tag. The tag becomes a separate segment as a result. Please note that although this kind of a rule is fine in this example, it is way too broad and might cause incorrectly split segments. You will likely need to make it more specific.
- Now that I’m done fixing segmentation in my project, it’s a good idea to make sure the segmentation rules are project-specific. OmegaT offers two sources of segmentation rules. The first one is the general rules in the segmentation.conf file located in the OmegaT settings folder. It applies to every project I open by default. The second source of the rules is those saved within a specific project, project-specific rules. If a project already has project-specific rules, OmegaT uses those instead of the general ones. Note how the omegat subfolder for this project doesn’t include the segmentation.conf file yet. This means that when I open this project, OmegaT will use my general segmentation rules located in the OmegaT settings folder. I’ll make the rules project-specific now. After I did this, the omegat subfolder includes the segmentation.conf file. By doing so, I make sure that whoever opens the project afterwards, myself included, will get the same optimized segmentation even they have different rules in the general segmentation.conf file.
That’s about it. Thank you for watching this video. You may also want to watch my video about merging segments in OmegaT.
What’s your opinion about OmegaT’s approach to segmentation? Do you find it easy to understand and use?