This is the second video from the OmegaT webinar. You can find the first one here.
This video explains the basics of using this option. When you use OmegaT, you face the following dilemma:
- You can make OmegaT display all leading and trailing tags, including the ones that you need and the ones that you do not need, i.e. superfluous tags.
- Or you can make OmegaT hide all these tags, but you will not be able to move these tags around if you need to do so.
Because the second option is obviously unsustainable (you cannot work without the tags that you need to move around in the translation!), I advocate using the first option. To fight the caveat—superfluous tags—you can use segmentation rules to extract those tags out of segments. And when you need to get a specific leading or trailing tag back into a segment, you simply add a segmentation counter-rule that takes priority over your “tag extraction” rules.
If you like this video, add us to your RSS reader to make sure you don’t miss future videos about some of the advanced OmegaT functions.
Hi, this is a nice workaround, but I noticed that you may end up with unwanted orphan segments when making the overriding segmentation rules, unless you make them very segment specific, which would not be always possible. Don’t you think this could be resolved by making OmegaT able to re-insert leading/trailing tags on a segment-level basis? I imagine an option in the drop-down menu that opens when you right-click a segment to “Insert leading and trailing tags for current segment only”.
Hi, Hector. Yes, this would be ideal, good idea.
Hi,
I am using an “improved” set of rules that I want to share with everyone here. I store them in a separate set named “Leading and trailing tags off”:
Rule 1
Pattern before:
(^|\n)(\s|xA0)*((]+>)+(\s|xA0)*)+
Pattern after:
[^]
Rule 2
Pattern before:
[^]
Pattern after:
(\s|xA0)*((]+>)+(\s|xA0)*)+($|\n)
This way you can make sure that any tags before and after your text make it to standalone segments. Actually, I was thinking that if this set of rules was the default, there would be no need for the “Remove leading and trailing tags” option. Instead, a “Hide tag-only segments” option would be better suited.
Now, if you go further down this trail, you could pre-process your source files so that all line breaks got replaced by some tag. In theory, that would make it possible to join segments that currently cannot be joined due to line breaks. I would like to test that approach, which I haven´t yet.
Hope you like the improved regex!
Hi Hector,
This is great. It will be my pleasure to test your new rules.
Would you care to explain what exactly is improved compared to what I suggested?
Thank you so much for sharing.