Premium
This is an archive article published on October 30, 2011

Cracking the code

Applying statistics-based translation techniques,linguists have decoded the Copiale Cipher,a hand-lettered 105-page manuscript that appears to date from the late 18th century

JOHN MARKOFF

It has been more than six decades since Warren Weaver,a pioneer in automated language translation,suggested applying code-breaking techniques to the challenge of interpreting a foreign language.

In an oft-cited letter in 1947 to the mathematician Norbert Weiner,he wrote: “One naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography. When I look at an article in Russian,I say: ‘This is written in English,but has been coded in some strange symbols. I will now proceed to decode.’”

Story continues below this ad

That insight led to a generation of statistics-based language programs like Google Translate—and,not so incidentally,to new tools for breaking codes that go back to the Middle Ages. Now a team of Swedish and American linguists has applied statistics-based translation techniques to crack one of the most stubborn of codes: the Copiale Cipher,a hand-lettered 105-page manuscript that appears to date from the late 18th century. They described their work at a meeting of the Association for Computational Linguistics in Portland,Oregon,US.

Discovered in an academic archive in the former East Germany,the elaborately bound volume of gold and green brocade paper holds 75,000 characters,a perplexing mix of mysterious symbols and Roman letters. The name comes from one of only two noncoded inscriptions in the document. Kevin Knight,a computer scientist at the Information Sciences Institute at the University of Southern California,collaborated with Beata Megyesi and Christiane Schaefer of Uppsala University in Sweden to decipher the first 16 pages. They turn out to be a detailed description of a ritual from a secret society that had a fascination with eye surgery and ophthalmology. It began as a weekend project this year,Knight said adding: “I don’t have much experience in cryptography. My background is primarily in computational linguistics and machine translation.” Uncertain of the original language,the researchers went down several blind alleys before following their hunches. First,they assumed the Roman characters and not the abstract symbols contained all the information.

But when that approach failed,they figured the code was what cryptographers call a homophonic cipher—a substitution code that does not have a straightforward correspondence between the original and encoded information. And they decided the original language was probably German. Eventually they concluded that the Roman letters were so-called nulls,meant to mislead the code breaker,and the letters represented spaces between words made up of elaborate symbols. Another crucial discovery was that a colon indicated the doubling of the previous consonant. The researchers used language-translation techniques like expected word frequency to guess what a symbol might equal in German.

But while the cipher was a notable success,Knight and his colleagues have been frustrated by other,more impenetrable ciphers. “There are these books and ancient languages of real historical value that contain historical information that we just can’t get out yet,and that’s of interest to a lot of people,” he said in a filmed interview describing the Copiale project.

Story continues below this ad

The work has value to historians who are trying to understand the spread of political ideas. Secret societies were all the rage in the 18th century,Knight said,and they had an influence on both the American and French Revolutions. Modern examples of challenging ciphers include the communications the Zodiac killer sent to the police in California in the 1960s and ’70s,and the “Kryptos” sculpture,commissioned for the CIA headquarters,which has been only partly decoded. But the white whale of the code-breaking world is the Voynich manuscript. Comprising 240 lavishly illustrated vellum pages,it has defied the world’s best code breakers. Though cryptographers have long wondered if it is a hoax,it was recently dated to the early 1400s. With a University of Chicago computer scientist,Knight this year published a detailed analysis of the manuscript that falls short of answering the hoax question,but does find some evidence that it contains patterns that match the structure of natural language.

“It’s been called the most mysterious manuscript in the world,” he said. “It’s super full of patterns,and so for somebody to have created something like that would have been a lot of work. So I feel it’s probably a code.”

Latest Comment
Post Comment
Read Comments
Advertisement
Advertisement
Advertisement
Advertisement