Alexis Paquin Drouin
Language Technologies Research Centre (CRTL)
Language technology has become a part of an Internet user’s daily reality. As a result, search engines, voice recognition software and machine translation tools increasingly shape our relationship with language. In fact, these translation tools are gaining in popularity and are even being incorporated into most software browsers.
In 1948, Andrew Donald Booth introduced the first electronic dictionary and thus became a pioneer in the field of machine translation (MT). At that time, MT tools were already in use. During the Cold War, for example, MT was used to translate Russian communications. Today, these tools are available free on the Internet and are often integrated into translation tools. However, despite its 68 years of existence, MT is still often poorly received in the field of translation.
We usually associate MT with poor translations, since it rarely yields a perfect product. However, the comical and incomprehensible results are easily explained. In order to use the tool optimally, we must bear in mind its limitations. The tool has a limited knowledge base, which is built up from a corpus during the learning phase. For example, twelve major problems have been identified that can interfere with MT: firstly, sentence segmenting (by individual words or by groups of words); morphology (types and forms of words); idiomatic expressions (fixed expressions that are unique to a language—for example, “it’s raining cats and dogs”); collocations (which are words often used with other words—for example, bear and hungry, as in the expression “to be as hungry as a bear,” and not “to be as hungry as a dinosaur,” which is meaningless); words the software does not recognize (for example, new words); lexical ambiguity (several meanings for the same word); coreference ambiguity (the software cannot identify the subject matter in the source language); structural differences between languages (differences in structure from one language to the other, as from Japanese to French); varying degrees of explicitness from one language to the other (too little or too much precision in the target language); insertion or deletion of words by the software; grammar problems in the target language; and, lastly, changing word order.
With language traps like the ones above, one might think that accurate MT is impossible. However, technological advances in the field abound. What’s more, in specialized fields with small corpora, MT can be quite effective. Take, for example, the Government of Canada, which has been using MT since the early 1970s for weather forecasts.
We can also significantly improve results with translation memories, which can be built into MT tools. We can also use a controlled language that is more restrictive or that relates to a specific field to make it easier for the machine to perform its task. Quality is also a subjective concept. If we take note of the potential problems, then we can already reduce the number of errors that could arise from the use of MT software.
In short, despite the technological improvements, MT tools rarely yield quality texts that are good enough to be published. MT works well in specific cases. For example, when a blind eye is turned to the results, MT can be useful for scanning a text and understanding the main points. MT can also be used for researching specific terminology (translation of a specific word or group of words). It can also improve bilingual communications in many ways. However, the most glaring errors occur primarily in cases where the knowledge of the target language is limited and where major MT-related problems are ignored.
It is still wise to have a translator revise the texts produced through MT so that they go through post-editing (revision of a pre-translated text).
Some of the information used in this article was taken from TRA1353 – Traduction automatique et postédition (Machine translation and post-editing), a course given by Louise Brunette, professor of Language Studies at the University of Quebec in Outaouais.