Buying translations

What are “badly formatted” files?

Hands typing on a laptop

Have you ever heard a translation agency talk about “badly formatted files” and wondered what that means? MEINRAD explains which elements in text design can cause problems in translation and increase what you pay.

There are lots of files that at first glance look absolutely fine – but when you look closer, you’ll see they aren’t structured and formatted properly. Some small issues can be no problem in the source document, and only become apparent when translating it.

Translating in CAT tools

For context, today’s translation agencies work with CAT tool software. That means translators produce their translations in this software rather than in the file itself, creating and using translation memories and term bases to help them. In simple terms, the project manager imports the file into the CAT tool, the file is translated there by the translator, and then the project manager exports the file again. So the CAT tool “spits out” the translation in the same format as the source file.

What’s the problem with “badly formatted” files?

So far, so good. The problem comes if the person producing the source text didn’t think about its suitability for translation – that can have a big impact. Everything might look good in the source text, as the author has designed it for that language and created a suitable layout (often going to some trouble to do so). But if the translated text is longer than the original, which can often be the case, then there are several common issues:

  • text abruptly cut off in text boxes
  • text extending beyond the edges of pages
  • unnatural spacing as a result of working with spaces rather than tabs

And there are other potential problems:

  • It might not be possible to import the files in the first place without laborious preparation, if they haven’t been produced to be suitable for translation.
  • The text may be split in the CAT tool (e.g. hard returns in sentences which should go together), increasing the risk of incorrect translations as the translator can’t tell that the text is supposed to be translated as one sentence.

More work needed before and after the translation

Badly formatted files mean more work before and after the translation in order to produce a usable target-language file. Either you’ll have to optimize the files yourself, or the translation agency can do it for you – but in the latter case, you’ll probably end up paying more, depending on what’s in your framework agreement. And this extra work can delay the start of the translation, which can of course be a big issue if the project is urgent. So it’s best to avoid this from the start by thinking about the translation from the moment you start producing the source documents.

Basic principles for producing files suitable for translation

One basic principle for source files is that they should be monolingual (unless they’re Excel files). If there are multiple languages in one file, the first job is to find out which bits need to be translated, which inevitably means more work. Another rule is that texts need to be in an editable format in order to be imported into the CAT tool. Any non-editable text, e.g. in images copied from other programs or files, will have to be typed out by hand.

Some common examples of badly formatted files:

 

Microsoft Word

Hard returns

Spaces and (multiple) tabs

Columns created using spaces

Size of text boxes

Table design

Graphics not grouped together

Contents / page numbers produced manually

Microsoft Excel

Badly formatted source texts

Incorrect existing translations

Poor structuring

Microsoft PowerPoint

Factoring in longer texts

Multilingual translations in tables

PDF files

Scanned PDF documents

Shifts and incorrect splitting of words/sentences

 

Microsoft Word

Hard returns

Hard returns in Word (and in many other programs) lead to the CAT tool creating a new segment. So if hard returns are used for formatting in the middle of sentences which carry on in the next line, these sentences will be split by the CAT tool. The project manager then needs to manually join these sentences so that the translator can translate them correctly and they can be saved usefully in the translation memory. Text in text boxes is often formatted like this.

 

Example of text in Word with a hard return

↑ Back to contents

Spaces and (multiple) tabs

If spaces and multiple tabs are used for formatting, this means things are moved around when the text length changes in the translation. The target-language file looks different from the source text and will need to be fixed by hand.

 

Example of text with numbering and lots of spaces in between.

Please don’t produce texts like this! Here, the indents were created using spaces, tabs and soft returns.

↑ Back to contents

Columns created using spaces

If you’re working with columns, you should use the column feature in Word in order to avoid chaos in the translation.

 

2021-12 Badly formatted files ms-word exp. 3

 

Here, the bold text in row 7 WON’T be shown after row 6 in the CAT tool – it will be combined with the first non-bold text on the left, which would be completely wrong. This is what the translator will see in the CAT tool:

 

2021-12 Badly formatted files ms-word exp. 4

 

The source file will have to be corrected before it’s imported to enable a correct translation.

As a rule of thumb, anything which isn’t correctly formatted continuous text and doesn’t follow the standard rules (like formatting with spaces rather than tabs, formatting with hard returns in the middle of sentences) will cause problems in translation.

↑ Back to contents

Size of text boxes

Like other issues, this affects multiple programs, not just Word. Text boxes should be larger than they need to be, as this will create space for the translation if it’s longer than the source text. That way, the text boxes won’t have to be manually enlarged once the translation has been produced in order to show the full translated text.

2021-12 Badly formatted files ms-word exp. 5

The text box here is bigger than it needs to be for the German text, which means if the translation is longer, the full text will still fit into the box.

↑ Back to contents

Table design

If tables are drawn by hand and “designed” using tabs, spaces and hyphens, the result is an untranslatable mess in the CAT tool.

 

2021-12 Badly formatted files ms-word exp. 6

 

This is what the translator sees in the CAT tool:

 

2021-12 Badly formatted files ms-word exp. 7

 

You should use the table feature in Word and soft hyphens, so that the parts of words and sentences which need to go together aren’t split in the CAT tool and the translator can translate everything correctly.

↑ Back to contents

Graphics not grouped together

If texts describing graphics aren’t grouped together, when the translation is exported things can often be moved around and the arrows pointing to a particular component suddenly point somewhere else altogether.

↑ Back to contents

Contents / page numbers produced manually

If contents are produced manually rather than using the contents feature in Word, all the page numbers will have to be checked once the translation is complete to ensure everything matches up. And that also goes for references to other pages added manually. If the translation is longer than the original and the text is no longer on the same page, the page number will be wrong.

And it also isn’t a good idea to manually edit automatically produced contents afterwards, for example to change page numbers or add sub-headings (which don’t come up in the text at all). The CAT tool won’t recognize them, and it usually won’t be until after the translation that you notice some bits haven’t been translated.

↑ Back to contents

Microsoft Excel

Microsoft Excel gives you the option to manage and translate texts in multiple languages in one document (each in their own column). If there are existing translations of individual cells, you can either import them or ignore them (then they’ll be left as they are in the translated document). Like other files, Excel files need to be formatted properly in order to avoid additional work in the translation process.

Badly formatted source texts

Make sure the column containing the source text doesn’t have text in any other languages. Some Excel lists become endlessly long over the years, and they can often have a mix of terms (e.g. German words in an otherwise English list). Not only does this mess up the entries in the translation memory (a TM is only ever meant for one source language and one target language), it might also lead to questions from the translator and could even mean another quick translation will be needed first.

 

2021-12 Badly formatted files ms-excel exp. 1

This Excel list has a mess of English and German in column A, and some of the existing translations in columns B and C clearly don’t match the source text.

 

In the above example: The text is being translated from English to Japanese, but German terms keep cropping up which weren’t immediately spotted by the project manager. It wasn’t until the translator saw this when working on the text and asked what they were supposed to do that it became clear there was a problem. The translator doesn’t speak German, which makes the situation even worse: all the German texts need to be translated into English first, which of course costs more money, and the software can’t filter the German text, so the project manager has to scroll through the Excel list and find the German text manually. As you can imagine, this will take a lot of time if it’s a long list.

↑ Back to contents

Incorrect existing translations

Excel files often contain existing translations which should be “left as they are”, with only the rest needing to be translated. However, when project managers look closely, sometimes it becomes clear that the translation doesn’t match up with what’s in the source column. Again, they’ll have to carefully look through the list to see which translations don’t match up – or it may be better to get all the existing translations properly checked.

↑ Back to contents

Poor structuring

Multilingual Excel tables should be structured as simply as possible, i.e. so that each row contains one sentence or phrase and text components which go together don’t run over multiple cells. And subsequently joining rows in one of the languages can cause problems, as the following example shows:

 

2021-12 Badly formatted files ms-excel exp. 2Rows 4 and 5 have been combined in the German (DE) column, but not in the English (EN) column. That means the multilingual import into the CAT tool won’t work, as the software can’t handle this inconsistency. In this case, the EN column needs to be formatted in the same way by combining those two rows.

 

 

2021-12 Badly formatted files ms-excel exp. 32021-12 Badly formatted files ms-excel exp. 4

In this monolingual file, the text in column A, row 2 runs into row 3. A view of the CAT tool (right screenshot) shows the problem: the text isn’t imported as a coherent whole, as the text in columns B and C is in between, and this makes life difficult for the translator.

↑ Back to contents

Microsoft PowerPoint

Microsoft PowerPoint and other similar software is an opportunity for people to use all the fancy design elements and graphics they like to create visually appealing presentations. As with other documents, they don’t pose any issues in the original language – it’s only when they get translated that problems arise and extensive work is required to make the end product usable.

Factoring in longer texts

In PowerPoint, it’s particularly important to factor in the possibility of texts being longer in different languages. Generally speaking, there isn’t much space for text in PowerPoint slides (and the text needs to be a large font size), so people tend to use all the space they have. If a presentation is then translated into French or Russian, for example, it’s safe to assume that lots of text will go beyond the edge of the slides or won’t fit into the space in the text boxes. So in order to avoid time-consuming formatting when the translation is delivered, we recommend that you never “fill” PowerPoint slides to the edges with text and graphics, and that you make text boxes larger than necessary.

 

2021-12 Badly formatted files ms-powerpoint exp. 1 1500x845

This example shows the importance of keeping the individual text elements in the exact same place in the translation as they are in the original (German in this case). For instance, “Sommerurlaubszeit” (summer holiday) would be out of place if it were by October/November. So when producing PowerPoint presentations, you should bear in mind the potential for longer texts and make the text boxes big enough for these texts to fit.

↑ Back to contents

Multilingual translations in tables

Even apart from Microsoft Excel, tables always make life more complicated when translating into multiple languages. The translations can’t automatically be inserted into the correct cells – the project manager has to copy them across manually. Clearly, this is both a long-winded process and something that’s liable to cause errors.

 

2021-12 Badly formatted files ms-powerpoint exp. 2500x845

Here, the translated text needs to be manually copied into the slides so that the client has just one single presentation rather than a separate one for German, English, Italian and French. Another problem here is that space is tight, so the translations of the last row probably won’t fit and the text will have to be made smaller, split up or shortened.

↑ Back to contents

PDF files

PDF files always involve more work for the translation agency, as the text they contain can’t be edited. That’s a problem, because as we said at the start, text needs to be editable in order to be imported into the CAT tool. So the file first has to be converted, and more often than not it doesn’t work very well, depending on how the document is designed and structured.

Normal continuous text previously created in Word can usually be converted easily enough, but things get harder when tables, charts, graphics and so on are involved. Section breaks are inserted during the conversion, and it’s often difficult to remove them without mangling the formatting. Some texts aren’t even recognized as texts and so aren’t converted (or at least not correctly), so they aren’t imported into the CAT tool and won’t be translated. That means they have to be typed out manually beforehand, and have to be manually inserted back into the document once they’ve been translated. It’s also important to be aware that automatic contents and in-document references won’t work once the document has been converted, so they’ll also have to be adapted by hand once the translation is complete.

Scanned PDF documents

Scanned files and hand-written notes cause real problems. You can expect major difficulties when converting them, mostly involving certain characters (often umlauts, numbers etc.) not being recognized correctly, and this increases the risk of incorrect translations. Before the translation begins, the whole file needs to be checked and compared with the original by a human being to ensure the source text is correct and can be translated properly.

↑ Back to contents

Shifts and incorrect splitting of words/sentences

If the software can’t reliably tell that letters/words go together and instead splits them, e.g. through returns, section breaks, tabs etc., the result is text in the CAT tool that isn’t displayed correctly. It may be that the first part of a sentence is in one segment, while the second part is somewhere else, which can make it impossible for the translator to produce a coherent translation as they don’t know that the sentence carries on elsewhere. This can sometimes be the case with sentences at the end of page, as the software doesn’t know that they continue on the next page. So the translator working in the CAT tool might see sentences that abruptly stop, with the end of the sentence in a completely different place.

 

2021-12 Badly formatted files pdf exp

 

Here, the sentence “Perfect solutions for the medical device and diagnostics industries” should be in one segment in the CAT tool. Instead, it was split into two segments (and in a very unhelpful way too), with the word “Applications”, completely devoid of context, in between. This is the result of poor text recognition when the PDF file was converted.

 

In addition to the lengthy process of preparing the PDF files, once the translation is complete there will usually be lots of work required so that the files have a viable layout.

Depending on the quality of the PDF file and the desired result the client has in mind, this extra work can be considerable when working with PDF files. So whenever possible, try to send files in their original format.

↑ Back to contents

 

These are just a few of the countless examples which illustrate what we mean by “badly formatted” files. Our top tip: while producing the original documents, remember that they will need to be translated and design them with that in mind.

 

 

Talking helps! Arrange a chat now.

 

Main image: © Adobe Stock