Premium
This is an archive article published on July 15, 2003

Troops may not go to Baghdad but GI Joe is learning Hindi

For a while now, the US Department of Defense and its research wings have been in a tearing hurry to read Hindi. Because keeping tab on a fa...

.

For a while now, the US Department of Defense and its research wings have been in a tearing hurry to read Hindi. Because keeping tab on a faraway border that doesn’t speak English is tiresome work, the translation a painfully slow exercise.

So from June 1, commanded by a message — the surprise language is Hindi…Good luck! — from a research funding branch of the US military, around 100 computational linguists at 11 sites in the US and UK invented a new set of information tools to translate Hindi text into English. And query Hindi databases with English questions.

The technology can identify documents with ‘‘highest promise’’ of holding information the researcher is interested in, saving translators hours of drudgery.

Story continues below this ad

‘‘A major national language too long neglected in the West with important similarities to other languages in northern India and to Urdu in Pakistan… it can benefit information retrieval in India and foster broader international co-operation and understanding,’’ flashed the Surprise Language Exercise command from sponsor Defense Advanced Research Project Agency (DARPA).

‘‘There are no comparable systems in existence in the world,’’ Eduard Hovy head of the Natural Language Group of University of Southern California’s Information Sciences Institute (ISI) told The Indian Express from Sapporo, Japan.

One reason to choose Hindi was ‘‘undoubtedly its strategic importance in world politics and that relatively little language technology has been built around Hindi to date,’’ he said.

Deadline was 30 days. They hit a successful Escape on Day 29 after a splitting headache because web-Hindi follows no common system of information retrieval, machine translation or encoding.

Story continues below this ad

‘‘I don’t know a word of Hindi,’’ confesses Douglas Oard, associate professor at the University of Maryland. ‘‘But my systems do, and using these I have become pretty good at reading English translations of Hindi documents.’’

Asia’s unresearched languages hold hot news today. So DARPA’s Translingual Information Detection, Extraction and Summarisation programme gives priority to perfecting machine translations to interpret information — helpful to detect, classify, identify foreign terrorists and decipher their ideas. Quickly.

‘‘These technologies will help analysts squeeze information out of material in languages for which the US government has no sufficient human resources. We were aware that Hindi is spoken by millions. Another reason was that Hindi doesn’t follow a Roman script,’’ linguist Ulrich Germann at ISI said in an e-mail interview.

They dipped into Hindi news websites, even those run by the government, to discover there is no large Internet search engine to index Hindi and not all had ‘‘translation equivalent’’ text.

Story continues below this ad

Using statistical models, they left it on the computer to find most likely translations for given foreign inputs. Chin-Yew Lin and Anton Leuski came up with a ‘‘super cross-lingual google-like multi-document search, summarisation, translation system’’ to enter search terms in English and generate results grouped by similarity.

One linguist figured how to get English headlines for groups of Hindi text.

Led by the ISI, the crack team included the Space and Naval Warfare Systems Command, University of New York, Carnegie-Mellon University, University of Pennsylvania, University of California, Berkeley, University of Massachussets, Johns Hopkins University, University of Maryland, University of Sheffield (UK) and IBM Thomas J Watson Research Laboratory.

Machine translation is handy for gisting — getting an idea of what a document is about. Trials and evaluations are scheduled for the next several months. Such a project was attempted only once for a limited two-week test three months ago, by some participants, using Cebuano spoken by six million people in the Philippines.

Latest Comment
Post Comment
Read Comments
Advertisement
Advertisement
Advertisement
Advertisement