The day Google Translate stopped to suck

After thinking a long while about my role online, I came to the conclusion that I am mostly an avid translator of what I consume on social media. Reading this blogpost on translation bias then triggered a trip down memory lane and I recalled my early days of using the translation service powered by Google.

Around 2008, I started venturing on the internet. At the time, and more than now, the main language encountered on the world wide web was English. Having no schooling in the language yet, I had to make do. At my disposal were no automatic page translators and I primarily relied on good old Google Translate – secondarily on my mom. After hours of retyping content as little me was alien to the concept of “ctrl+c” and “ctrl+v” I became frustrated at Google Translate’s poor performance at translating complex sentences – read sentences with subclauses. The limitations of the translator were well-known and even derided by many of my professors from primary school to university, until November 2016 that is. From that point onwards, Google Translate not only learned from its mistakes, but was able to translate whole paragraphs at a time. Why? How? Let’s find out.

Google’s Translate’s game-upping: from SMT to G-NMT

The answer comes from this announcement on the Google blog by Barak Turovsky. From this blog we know that from November 15, 2016 onwards Google started using a new system for translating languages. Google Translate departed from statistical machine translation to Neural Machine Translation.

Neat. But what does Statistical Machine Translation (SMT) mean and how does this differ with Neural Machine Translation (NMT)? The idea behind SMT (find it more in detail here) is to feed a computer with a large dataset of texts in two languages (think European Parliament documents for example). The computer will then analyze and compare all sentences – divided in word units at a first stage of the translation model – in both versions of the documents in order to find translation patterns. The computer will then use the patterns it has computed (seeing “arbre” translated as “tree” 3000 times for example) to translate the input in the target language with the most statistically significant data. This method of translating is useful, but knows limitations: as no humans are there to guide the computer in its calculations, word correlations were sometimes – a lot of times – dubious. Moreover, what happens with cases (nominative, genitive and so forth), word gender and homonyms? You guessed it, a lot of debugging.

What about NMT? The story starts with Google noticing this paper from 2014 in which the authors put forward a novel idea. Indeed, they trained two neural networks (more on this here and here) to act as a pair in order to encode a language and decode it in another language in order to improve SMT translations. One of the major take-aways here is that this type of machine translation does not rely on a central dictionary as a key source: in other words translation without intermediary is possible between languages that have no shared corpus. Flashforward to said announcement in November 2016: Google engineered its own NMT (creatively titled “Google Neural Machine Translation”) to train itself in a way that optimized translations will be generated over time. And indeed! Back when I started to study in September 2016, the translations from Chinese to English were not reliable even for a beginner’s understanding, to say the least. In 2018 when I had to peruse Chinese newspapers and make translations to Dutch, I relied on this translator to spare me long from blurting nonsense in class when a translation was needed (I apologize to my teacher).

A last tidbit: Google uses crowdsourcing to build its networks. Your feedback on translations is very much likely taken into account. This also means that what you type to translate can end up in the database to predict input, so watch out with sensible information!

Wonder why sometimes you get a wild suggestion? Thank the crowdsourced database!