菜月昴和雷姆:传译来自利比亚的民众之声

来源:百度文库 编辑:九乡新闻网 时间:2024/05/06 06:53:28

新科学家杂志:沙漠绿洲之城奥库法拉赫坐落于利比亚境内的东南远端深入撒哈拉沙漠,这里本来不会是外国新闻记者瞩目的地方,但是在2月23日一条该城抗议者通过手机传送的语音信息通过Google提供的语音转化技术在推特上发表了,这条信息是:“年轻人已经控制了城市,他们升起了被卡扎菲政变推翻的前国家国旗。”这条信息之所以引人注目,是由于在2月23日前在利比亚还没有一名西方记者存在,而这条消息让世界上的人们知道了在利比亚发生的情况。这条语音消息是推特从 Alive in Libya这个网站摘录的,而在这个网站的背后是一支志愿者大军,他们负责把阿拉伯语转译为英语,而这种技术就是Google近年来所致力的机器与人工混合翻译技术,这种混合翻译技术比起纯机器翻译有了质的提高。

Crowdsourced translations get the word out from Libya

  • 18:01 25 February 2011 by Jim Giles and Jacob Aron

The message is clear in any language (Image: Sven Torfinn/Panos)

The oasis town of Al Khufrah lies deep in the Sahara desert in the far south-east of Libya. Lying almost 1000 kilometres from its nearest sizeable neighbour, it is not somewhere foreign journalists tend to visit.

But on 23 February, news from the town reached the English-speaking world. "Greetings this is an urgent message from Kufra," said the anonymous source. "Young people have taken complete control of the city, they hoisted the flag of Libya and Gaddafi down the flag."

The message arrived by an ingenious route. It started with a voice message in Arabic left on a phone line operated by Google. Software managing the line published the message on Twitter, from where it was picked up by the website Alive in Libya. The tweet went out to Alive's army of volunteers, who provided an English translation for the site. It is just one of around 170 reports, from videos to tweets to audio recordings, that Alive in Libya has translated since it started on 19 February.

The site has been an important resource because until 21 February there were no western reporters in Libya, notes Andy Carvin, a social media strategist at National Public Radio in Washington DC. "Their translation work has helped give more credibility to a number of sources, as well as providing reporters and the public with more context on any given situation."

Fostering debate

Crowdsourced translation like this is finding a growing number of applications. It was used in Haiti to translate messages sent in the aftermath of last year's earthquake. "Alive in" projects are also active in Egypt and Bahrain. AtMeedan, a social network aimed at fostering debate between Arabic and English speakers, 9000 registered users help translate 300,000 words every month, says Ed Bice, the site founder. Facebook's efforts to crowdsource translation of its site were so successful that it has given website owners access to the translation tools it created.

Crowdsourcing is also being used to improve machine translation services likeGoogle Translate. Software tools like this are useful for quick and dirty translation, but they can make embarrassing mistakes. Try using Google to translate the phrase "Machine translation is good, but not great" from Japanese to English and back again, and the result is "Machine translation is not good, great" – clearly not the same thing. Sometimes the result is simply nonsense. Translating "I could murder a pint" to Spanish and back gets you "I could kill a liter".

For people who can speak both the source and the destination language, such mistakes are easy to spot, so why not let bilingual humans lend the machines a hand? Adam Lopez, a machine translation researcher at Johns Hopkins University in Baltimore, Maryland, sees such teamwork as a step on the road to a universal translation system that will ensure material posted online can be read in any language. "This is really speculative, but a few decades from now, who knows?" he says.

The building blocks for such a system are already in place. Posts in Arabic on Meedan are automatically translated into English, and vice versa, and then opened up to the site's users, who tweak the translations as necessary. Even Google lets its users suggest corrections to translations. "When you refine a translation, we'll take that and feed it back into the system," says Chewy Trewhella, new business development manager at Google. "Every time a user refines a search they're helping in some way."

Translate and learn

To motivate people to provide translations, some researchers are trying to build an educational component into the process. Luis von Ahn, a computer scientist at Carnegie Mellon University in Pittsburgh, Pennsylvania, is developing a service called Duolingo designed to help people to learn a language while also generating useful translations. More details will be available when the service launches around two months from now.

"The idea that monolingual speakers could improve machine translation started out as a crazy idea," says Philip Resnik, a computational linguist at the University of Maryland in College Park. His MonoTrans2 allows users reading the first draft of a computer translation to flag up phrases that appear incorrect. This information is then passed to people reading the text in the original language, inviting them to rephrase problem phrases in a way the machine might better understand. After the computer has had another try, the new phrase is flagged up, prompting readers to judge whether it is correct.

Resnik tested the system by translating children's books from Spanish into German. He found the number of sentences rated perfectly fluent and accurate by bilingual evaluators increased from 10 to 68 per cent when compared with Google Translate. It's not perfect, but it shows what can be achieved when humans and machines work together. "People don't really care whether the translations they're getting are coming from a machine or not, as long as they get them when they want them," says Resnik.

The common language of statistics

The machine translation revolution began 20 years ago, when a research group at IBM introduced a technique based on statistical analysis. Previous attempts at translation had focused on encoding linguistic knowledge as rules for the computer to follow. But for this new approach, dubbed statistical machine translation (SMT), almost no linguistic knowledge was needed.

Instead, the SMT system searched a large collection of texts, each of which was in two different languages, searching for statistical patterns that indicate the presence of words and phrases with the same meaning. Google and other translation companies use similar SMT systems today.