Yes, I want my consultation

Term Extraction with Rainbow

Managing terminology in translation projects with RainbowSaving time and increasing efficiency with glossary management

Creating glossaries manually is fine, but if you are looking to standardize and optimize terminology management, you might be better off using programs designed for this purpose. One of them is Okapi’s Rainbow, which I mentioned earlier this year as a tool allowing the user to extract terms in order to create a translation glossary. This article is a brief introduction to this process.

Why use a glossary?

An extensive glossary serves several purposes:

  • Just having such glossary has a psychological advantage, as it puts pressure on a translator to translate more consistently.
  • You can feed this glossary to a QA program to check terminology consistency automatically.
  • Adding a list of terminology to a CAT tool such as OmegaT makes it possible to insert terms quickly into the translation.

The end result of using a glossary is a higher quality of translation.

Why extract terminology automatically?

Rainbow is ideal for several scenarios:

  1. Translation has been completed without creating a project glossary. A professional translator creates a project glossary by adding terms to it in the course of translation. When the glossary is already created, extracting terms is unnecessary. But what if there is no glossary after translation? This is often the case with editing projects, where a client wants to have someone else’s translation revised. In this case, Rainbow comes in very handy for the editor.
  2. You have a big project and need to make sure that the glossary includes as many terms as possible. When a translator creates a glossary manually, he will likely miss some of the terms. Since Rainbow isn’t prone to this kind of human error, you have the assurance that your glossary is complete.
  3. A translation agency may include a list of terminology directly in a translation project sent to a freelance translator. In this case, it is the very last file in the translation project, serving two purposes. First, by translating this list, a translator will create a glossary that the agency can then use for automated QA. Second, having to translate this list will put pressure on the translator to double-check consistency. The only drawback is that this process increases the management burden for a project manager.

How to use Rainbow

Creating a list of terminology with Rainbow is actually the easiest part of the tasks described above.

  1. Drag the source files into the input list. Almost all imaginable formats are supported.
  2. Go to Utilities -> Term Extraction.
  3. Change the output path and tweak the settings. For more information about the settings, refer to the Okapi wiki page.
  4. Click Execute.
  5. Open the resulting file. It is a tab-delimited file, with the number of occurrences in the first column and the actual term in the second column. You can copy and paste the contents of this file into a spreadsheet for further sorting and trimming.
  6. After you have the edited list, use it as intended. We mostly add it to a translation project so that a translator can work on it after translating the actual files of the project, as I described above.

For more information on this topic, read my post about the glossary as a key tool in a translation process.

What’s your opinion about term extraction? Is it worth the extra effort? Or is adding terms to a glossary manually good enough?


  • Pablo Bouvier says:

    I do not use Rainbow. However, extracting terminology is not so easy as explained. This procedure will give a frecuency rated list of terms. This means, it will give a lot of high rated not useful stopwords that are just no-sense to create a glossary.

    Technical jargons are plenty of grouped words that belongs together. I do not know, if Rainbow extracts expressions too, but as you write about actual words, I should suppose this is not the case. To give a sample: the translation of “cage synchronous motor” in some idioms will not be just the sum of the translation of the free isolated terms “cage” + “synchronous” + “motor”.

    • Hello Pablo,
      Thank you for your comment. Of course, you will have to edit the list of frequent terms to prepare a meaningful glossary. I agree it is not easy, but my point is that it is worth it, especially in larger projects. You can also build your own list of stopwords in Rainbow over time, so that the lists it produces are cleaner.
      Yes, Rainbow can extract expressions, rather than separate words. The respective option is called “Remove entries that seem to be sub-strings of longer entries.” So, for example, if you have 5 occurrences of “cage synchronous motor,” the list will include just “cage synchronous motor.” Of course, if any of these words also occurs as a standalone word in a project, it will be also included in the list as a standalone word.
      Best regards,

  • Pablo Bouvier says:

    Hello Roman,
    Thanks a lot for your answer. If Rainbow handles stopwords and sub-strings it will be fine.
    And if it can sort expressions by descending length order it would be wonderful. I’ll check Rainbow myself now, as it seems to be quite powerful once you know how to use it.
    Kind regards,

About the Author

Roman Mironov
Roman Mironov
CEO & Founder

As the founder of Velior, Roman has had the privilege of being able to turn his passion for languages into a business. He has over 15 years of experience in the translation industry. Roman has helped dozens of clients increase sales by making their products appealing for speakers of other languages.