MEINRAD's Blog

The DeepL glossary: at long last the machine translation output can be customized

Written by Meinrad Reiterer | 6 September 2022

Although the machine translation tool DeepL has given users the opportunity to create their own glossary for some time, this was not very useful for the everyday work done by translation agencies. Glossaries are now available for use with the DeepL Pro plug-in in memoQ. Terminology and machine translation manager Bianca Stadler gives her impressions after testing the new functionality at MEINRAD.

What was the problem with using DeepL for machine translations in the past?

At MEINRAD we reckon DeepL Pro produces very good translations in terms of style and phrasing. The problem is that terms were translated inconsistently, even within the same text. It’s not uncommon for key and specialist terminology to have two, three, four or even more potential translations, and of course it’s tedious for the post-editor to go through the text to ensure terms are used consistently. That’s where the DeepL glossary comes in handy. For about two years now the web version of DeepL has given people the chance to enter their own terms and use them for translations in certain languages, but you could only do that online, so that didn’t really help us as a translation agency. But now we can use this glossary via the plug-in in our CAT tool.

What is the DeepL glossary, and what can it do?

The glossary means you can “tell” DeepL how you want particular terms to be translated. This ability to specify the translation of particular terms means the machine translation engine output will become more reliable, which should put an end to inconsistent machine translations. I think that’s a big breakthrough.

 

Example 1: DeepL without glossary – source text (left), target text (right):

Example 2: DeepL with glossary:

In Example 1, DeepL was used without a glossary. DeepL has translated “imager” in English as “Imager” in German. In Example 2, DeepL was used with a glossary and has translated “imager” as “Bildgenerator”, exactly as specified in the client’s glossary. You can also see how DeepL has translated the term correctly in composites, such as “Bildgeneratormodul”.

You and your colleague Martin Maritschnig have already tested the glossary extensively. How did you do that?

Yes, Martin and I have already carried out quite a few tests. First, the terminology needs to be provided in the correct format, which is currently a bit time-consuming as you can only use TXT and TSV files. And the terminology also needs to be cleaned up beforehand. If there are special characters such as asterisks or hyphens to indicate word stems in a term base or terminology list, they need to be removed manually before using the terms with DeepL. And the glossary is also very sensitive when it comes to synonyms: there really can only be one entry for each term. That’s how it should be, of course, but it often means there’s a lot of preparation to do first, as duplicate entries are widespread. It gets a bit easier after that, as once the file has been prepared, all you need to do is add it to the glossary. Then the next time you have a machine translation project for that client, DeepL will use the terms it contains.

What are your impressions – how well does the glossary work?

I absolutely love it. We can rely on DeepL to use the specified terms, and it handles most of the inflections and plurals correctly. So translations are much more fluent and consistent – even before the post-editors get to work.

Apart from improved consistency, what other benefits do you think it has?

For me, one benefit of the glossary is that it allows the machine translation output to be customized. DeepL’s output used to be the same for everyone, but now it can be tailored for each client by taking their own terminology into account. And that’s a real boon for everyone using our machine translation self-service portal, where they can get texts machine translated and instantly see the results without paying for a review by one of our post-editors (which is often enough for in-house purposes).

Is the glossary ready for MEINRAD clients to use? And if so, what do they have to do in order to use it for their translations?

Yes, the glossary is now ready to use, though only for a limited number of language pairs at the moment. Clients can let us know if they’re interested, then we’ll need to establish whether they can give us the terminology or whether they’d like us to prepare existing term bases. As I said before, that can involve a fair amount of work at the start, depending on the size and quality of the term base. If we don’t yet have a term base, it’s easiest for clients to send us a two-column Excel list of the specific terms, with the source-language terms in column A and the desired translation in column B. The fewer duplicate entries there are, the better, as that means we’ll have fewer questions and less cleaning up will be needed. We also recommend carefully checking the list to make sure it only contains specialist terminology before sending it to us – glossaries shouldn’t contain any “everyday” words. And clients should be aware that the prepared file can’t be any bigger than 10 MB.

 

Which languages can use the DeepL glossary?

The glossary is currently available for several language pairs: English into German (EN -> DE), German into English (DE -> EN), English into French (EN -> FR), French into English (FR -> EN), English into Spanish (EN -> ES), Spanish into English (ES -> EN), English into Japanese (EN -> JA), Japanese into English (JA -> EN), English into Italian (EN -> IT), Italian into English (IT -> EN), English into Polish (EN -> PL), Polish into English (PL -> EN), German into French (DE -> FR), French into German (FR -> DE), English into Dutch (EN -> NL) and Dutch into English (NL -> EN). But I think more language pairs will be added soon.

What can’t the glossary do very well at the moment, and what would you personally like DeepL to do with it?

As I mentioned above, the formats are restricted to TXT and TSV files, which makes preparing them less convenient for us. So my hope is that in future we’ll be able to use CSV and other file formats as well. Other issues are that you can only use one glossary for each client and language pair, and that the glossary can’t yet handle forbidden terms – it would be fantastic if it could take them into account as well. From our perspective as a translation agency, of course the ideal scenario would be to simply integrate the term bases in the CAT tool with the DeepL plug-in, though that’s a bit of a pipe dream. But I’m confident that DeepL will continue to enhance its glossary, as it really has a lot to offer.

 

 

 

Main image: © Shutterstock