Translating PDF documents is more challenging than other common file formats, since PDFs are and as such are not well suited for translation. If you want to have such documents translated, you have several options discussed in this article, with different costs associated with each. My goal here is to help you do this economically by avoiding costly re-creation of formatting. Here is a list of terms I will use:
- PDF files: files provided for translation in an uneditable format.
- Editable source documents: files from which PDF documents were created.
- Re-creation of formatting: manual process of mirroring the formatting of PDF documents in empty files from scratch.
Find and translate editable source documents instead.
It is crucial to understand that with a PDF, there is always an editable original document from which IDML files), Microsoft Word (DOCX), Microsoft PowerPoint ( ), or web browsers (HTML).created. PDFs are almost invariably files generated from other file formats by producing a printable version. Examples include PDFs produced from Adobe InDesign (INDD or
When it comes to translating PDFs, you want to avoid wasteful work that ensues if your translator has to re-create the formatting from scratch in empty files. To avoid this, simply send for translation the editable source document that was used to create this PDF. The vendor will translate the editable source document directly, letting the CAT (computer-aided translation) tool mirror the formatting automatically. No wasteful re-creation of formatting for the vendor means zero additional costs and higher quality for you.
If you forget to send an editable source document for translation and then your translator reminds you to do so, do not dismiss this request automatically—try to remember where this PDF came from and find that source. It is worth it.
If you do have the editable source document: Inserting translations yourself.
With editable source documents, translation is done by replacing the original text with the translations directly in these documents (normally done by a CAT tool). Clients may want full control over this process. For instance, you may choose to have it done by your in-house graphic designer who created the source document in the first place, rather than let the translator do it. In this case, you would go about it like this:
- Extract all text to be translated from the editable source document into a separate DOCX file.
- Have the vendor translate this file in the form of a bilingual table that makes it clear what goes where in the source document, even for a non-speaker of the target language.
- Have your graphic designer insert the translations back into the source document.
This option is much better than having the translator re-create the formatting from scratch, say, in Microsoft Word files. However, it is still not as good as translating the editable source document directly, because instead of having the translations inserted automatically, you end up with a manual copy/paste process that is costly and prone to errors.
If you do not have the editable source document: Keeping the formatting to the bare minimum.
Sometimes, editable source documents are just unavailable. Imagine you want to translate a document that was faxed to you or that you distribute a product and want to have a user manual translated, but have no access to the editable source document originally created by the manufacturer.
In this scenario, there is no other way for your translation vendor, but to start with an empty Microsoft Word file and re-create the formatting. Note that a professional translator re-creates the original document first and only then proceeds with translating it instead of typing the translations into an empty file and formatting it as they go. To keep costs low, you can exercise control over the amount of formatting that the vendor re-creates. Less formatting generally means less work and lower costs. One way to save is therefore instructing your vendor to keep the formatting to the bare minimum.
Note that re-creation of the formatting has an inherent risk of introducing textual errors. This is how you might end up with the translation saying “5,” whereas the original says “3,” for example. No matter how great the OCR tools used by the vendor and how much the vendor checks the re-created document against the PDF, it is still largely a manual, unreliable process compared to translating the editable source document where such a risk is negligible. Which brings us back to the idea that nothing is better than translating the editable source document.
Another thing to keep in mind in this scenario is that your vendor normally returns the translation as a Microsoft. If you want the translation in a different format, such as Microsoft Excel or PowerPoint, you need to ask about that upfront.
If you do not have the editable source document: Mirroring all the formatting.
Asking your vendor to limit the re-created formatting to what is absolutely necessary is not always practical. Often, you want to have all the formatting mirrored as closely as possible. Examples include:
- You expect a professional-looking document ready to be used for business purposes. An unformatted document is unacceptable.
- You need to submit the translated documents to a regulatory agency or court. Not only does the formatting have to be identical to the original, but often even the pages have to match, one to one.
This last option is the most expensive, because the vendor bills you for re-creation of all the formatting.
Remember that PDF files cannot be translated directly and need to be re-created in another application, triggering additional costs. As a translation buyer, you can do two things to cut costs:
- You can save money and reduce the risk of errors by finding the editable source documents and having them translated instead of the PDFs.
- At the very least, ask your vendor to keep the re-created formatting to the bare minimum, as long as this is acceptable to you.
Re-creation of formatting is not to be confused with DTP, which takes place after translation, whether the source files are PDFs or editable documents.