Designers of machine translation tools still mostly rely on dictionaries to make a foreign language understandable. But now there is a new way: numbers.
Facebook researchers say rendering words into figures and exploiting mathematical similarities between languages is a promising avenue — even if a universal communicator a la Star Trek remains a distant dream.
Powerful automatic translation is a big priority for internet giants. Allowing as many people as possible worldwide to communicate is not just an altruistic goal, but also good business.
Facebook, Google and Microsoft as well as Russia’s Yandex, China’s Baidu and others are constantly seeking to improve their translation tools.
Facebook has artificial intelligence experts on the job at one of its research labs in Paris.
Up to 200 languages are currently used on Facebook, said Antoine Bordes, European co-director of fundamental AI research for the social network.
Automatic translation is currently based on having large databases of identical texts in both languages to work from. But for many language pairs there just aren’t enough such parallel texts.
That’s why researchers have been looking for another method, like the system developed by Facebook which creates a mathematical representation for words.
Each word becomes a “vector” in a space of several hundred dimensions. Words that have close associations in the spoken language also find themselves close to each other in this vector space.
“For example, if you take the words ‘cat’ and ‘dog’, semantically, they are words that describe a similar thing, so they will be extremely close together physically” in the vector space, said Guillaume Lample, one of the system’s designers.
“If you take words like Madrid, London, Paris, which are European capital cities, it’s the same idea.”
These language maps can then be linked to one another using algorithms — at first roughly, but eventually becoming more refined, until entire phrases can be matched without too many errors.