当前位置:网站首页>Saving lost civilizations, AI identifies lost ancient languages

Saving lost civilizations, AI identifies lost ancient languages

2020-12-07 14:47:39 Mango fruit

 Saving lost civilizations ,AI  Identify lost ancient languages

Language , It can help people understand a culture , Learn the way this culture looks at the world . Every culture “ Have something to say ”, When a language disappears , It's a tragedy for all mankind .

however , What if there was a way to automatically recover these lost languages ?

Researchers at MIT's computer science and Artificial Intelligence Lab have found a way , We can use machine learning to help us decipher the lost languages .


Luo Jiaming, co-author of the study, said :“ Our job is to automatically decipher lost languages written in incomplete or incomplete text . obviously , For some ancient languages , The word breaker was not invented , Or it doesn't get consistent application . The significance of our work is that , Our work is the first attempt to , Using machine learning to automatically decipher ”
It means , We are finally able to understand the grammar behind the written versions of the Lost Language 、 Vocabulary and syntax .

The team is particularly concerned about the phenomenon that there are few or no spaces in the text , This phenomenon is called scriptio continua.

image.png

Looking for language cousins

Usually , To crack code in an unknown language , It helps to know at least another relevant language . for example , Many years ago, experts were able to decipher Gothic , An extinct group of Eastern Germanic languages . Thanks to it and the original German , The connection between old Nordic and known languages such as old English . Inspired by this concept , The team followed a similar line of thought and developed their decoding algorithm .

Luo Jiaming explained that :“ Our machine learning model works by matching as many pairs of words as possible between ancient and known languages , At the same time deal with the uncertainty in participle . What is a real match , It depends on their vocal correspondence at the role level , And how many rules these correspond to .”

“ for example , If you find that many pairs have consistent changes ( Such as p To b), So these pairs are really matched . Because historical linguistics tells us , Language changes are regular and consistent . If the two languages are really relevant ( Like Spanish and Italian ), So you'll see these patterns come up again and again .”

In addition to being able to integrate these linguistic tendencies , The model also uses language sounds “ The embedded ” To deal with the uncertainty brought about by UN segmented text in fictional multidimensional space .

By using this framework , The model can detect patterns in the evolution of related languages , This allows it to segment and isolate words in the undeciphered language , And map them to words in known related languages .

Has been used to decipher unknown languages

As the research group outlined in its paper , It is known that this connection between deciphered and uncracked languages can be used as a benchmark , A kind of “ The basic truth ” , To help determine if this AI powered deciphering model is really effective .

In this study , The team used a known relationship between Gothic and Ugaritic , To test their models in unknown languages , Such as the expression in Iberian .

Through this process , The team used their machine learning model to confirm that Iberian is in fact not related to Basque , There are other possibilities , Such as Germanic 、 Turkish and Ural languages , This conclusion is supported by other recent findings .

Luo Jiaming said :“ Our work may be helpful for linguists to quickly analyze the relationship between two languages , Especially when one of the languages is unknown . It is not as full and thorough as human analysis , But it's much faster , It also requires much less human resources .”

Although the model seems to be good at assessing the correlation between the two languages , But the team's goal now is to extend the model beyond its current functionality , So that you can deal with a variety of languages that may not be relevant .

at present , The research team hopes their models will help automate , And can remove some guesswork from the usually tedious process .

Thesis link :http://people.csail.mit.edu/j_luo/assets/publications/DecipherUnsegmented.pdf?utm_source=thenewstack&utm_medium=website&utm_campaign=platform

segmentfault  official account

版权声明
本文为[Mango fruit]所创,转载请带上原文链接,感谢
https://chowdera.com/2020/12/20201207144658505q.html