Research at Google on Tuesday launched Google Neural Machine Translation system, now in production with Chinese to English — “a notoriously difficult language pair,” according to Quoc V. Le and Mike Schuster, research scientists on the Google Brain Team.
GNMT already is powering the Google Translate mobile and Web apps for 18 million or so Chinese to English translations daily.
Google will roll out GNMT to the rest of the 10,000 language translation pairs its Google Translate service supports in the coming months.
For English to French or German translations, GNMT achieves “competitive results” to the state of the art, Le and Schuster noted in a blog post.
A human side-by-side evaluation on a set of isolated simple sentences found GNMT reduces translation errors by an average of 60 percent compared to Google’s phrase-based translation system.
GNMT “can still make significant errors that a human translator would never make, like dropping words and mistranslating proper names or rare terms, and translating sentences in isolation rather than considering the context of the paragraph or page,” Le and Schuster acknowledged.
GNMT’s Inner Workings
The GNMT model consists of a deep long short-term memory network with eight encoder and eight decoder layers using attention and residual connections.
Its attention mechanism connects the bottom layer of the decoder to the top layer of the encoder to reduce training time and improve parallelism. Neural networks are inherent parallel algorithms, which can be leveraged by multicore CPUs, graphical processing units and computer clusters with multiple CPUs and GPUs
During inference computations, GNMT uses low-precision arithmetic, which helps in the design of very power-efficient hardware for deep learning, to accelerate the final translation speed.
It divides words into a limited set of common sub-word units Google calls “wordpieces” for both input and output. That provides a good balance between the flexibility of character-delimited models and the efficiency of word-delimited models. It also naturally handles rare word translation and improves the system’s overall accuracy.
The Mechanics of Translation
When translating a sentence from Chinese to English, the GNMT considers an entire sentence as a single unit for translation, encoding the words in it as a list of vectors with contextual meaning. Each vector represents the meaning of all words read so far, instead of being considered on its own.
The system then decodes the sentence, generating it in English, one word at a time in context. The decoder refers to a weighted distribution over the encoded Chinese vectors that are most relevant to generate the appropriate English word.
“This is more scale than just accuracy, but it’s a very impressive showcase of applied artificial intelligence,” said Rob Enderle, principal analyst at the Enderle Group.
“While it’s a pale shadow of what’s coming, it’s a huge step forward in this area,” he TechNewsWorld.
“The processing power to translate context between these vastly different languages at Internet scale has only existed for a very short time,” Enderle noted. GNMT “not only showcases the impressive amount of performance now available, but also how quickly we’re now able to apply it to both interesting and critical problems.”
Google researchers have not been the only ones tackling translation problems, noted Michael Jude, a program manager at Stratecast/Frost & Sullivan.
GNMT “is an advance in natural language translation, but not a real breakthrough,” he told TechNewsWorld. “IBM has been doing this for a while.”
On the Horizon
The goal is perfect translation at scale, and “we should be closer to that goal in the 5-to-10-year time frame,” Enderle said. Processing power is the main hitch, and “we’ll likely need a 10x improvement to get this system where it needs to be.”
Human translators occasionally make mistakes, and emotional freighting and sometimes nonverbal contexts may impact translation, he noted.
That said, “a system that looks deeply at context should be able to exceed the performance of two people talking natively, because it’ll always consider context while we often don’t hear it,” Enderle suggested. “To get there, however, there will likely need to be a visual element, so that nonverbal communications are captured as well.”