Sun Maosong of Tsinghua University: through the noise, sitting and watching the clouds rise, NLP's myth and perception
Picture source ： Surging news
2010 In, deep neural network made milestone progress in speech recognition research , Take this event as a new starting point and new kinetic energy , The whole field of artificial intelligence has rapidly jumped to the era of deep learning , Including natural language processing （NLP） And other key areas have also made great progress .
Ten years , Deep learning is in NLP Significant performance improvements have been achieved on the vast majority of tasks , In recent years, there have been BERT and GPT3 A large-scale pre training language model , It has become the strategic focus and hotspot of technology competition in the field of global artificial intelligence , Even led the trend of a period .
Natural language processing technology based on deep learning is developing rapidly “ Huge data 、 Maximal model 、 Maximum computing power ” The orbit of ,“ resort to every conceivable means ” All the way . But this road goes to the extreme , What will the future look like ？
Look at the past , buzzing with excitement ,“ The flowers are getting more and more attractive ”, But in research , And the depth of the real problem , It seems that it still stays in “ A light grass has no hoof ” The stage of .
Professor Sun Maosong of Tsinghua University is in “ The sixth language and Intelligence Summit Forum ” In your keynote speech , Explore this myth . According to its report, Zhiyuan community , Sort out the core contents as follows , For reader's reference .
Reporter ： Sun Maosong , Professor of Tsinghua University , Zhiyuan Research Institute NLP Major direction chief scientist
Arrangement ： Zhang Hu , Niu Menglin
proofreading ： Dai Yiming
General tone ： Deep learning makes NLP Promoted to a new pattern
2010 In, deep neural network made milestone progress in speech recognition research , Take this event as a new starting point and new kinetic energy , Deep learning will NLP Promoted to a new pattern .
Deep learning liberates natural language processing from the rationalist approach in the ivory tower , From then on, it can be practically applied to practical application . for example , As a typical application scenario , The machine translation industry has developed rapidly . The method based on deep neural network is better than the previous generation of statistical machine translation method based on Shannon information theory , There has been a qualitative leap in translation effect .
This speech will start from machine translation , To explain the progress of natural language processing in the era of deep learning , Existing problems and challenges and some solutions .
One 、 Machine translation based on deep learning
Machine translation technology based on deep learning technology , Compared with the previous generation of statistical machine translation methods based on Shannon's information theory , The effect has been significantly improved .
At present, quite a number of enterprises providing human translation services , Generally, a round of machine translation will be carried out first , Then translate it manually , This working mode will significantly improve the efficiency and quality of translation . however , From the perspective of translation experts （ Here we quote famous contemporary American scholars 、 Cognitive scientist 、 Pulitzer Prize for nonfiction 《 Godel 、 Escher 、 Bach : It's a combination of different styles 》 The author Mr. Hou Shida said after testing the effect of Google machine translation ）：“ Machine translation reflects the goal of the enterprise , Not the goal of Philosophy ”.
Because the machine translation method based on deep learning does not have a deep understanding of semantic information , Therefore, the current translation quality can only reach an unsatisfactory level .100 Many years ago, Mr. Yan Fu was in 《 The theory of Tianyan 》“ Translation examples ” It's said in “ Translation is three difficult things ： Letter 、 reach 、 Jas. ” Three translation realms , Now the goal of machine translation is only “ Letter ” This level , And “ Jas. ” This level is very different .
below , Machine translation services for three major enterprises , This paper makes a case observation on the current machine translation technology based on deep learning ：
First , Randomly select a paragraph from the report on Olympic athlete Su Bingtian , Respectively in Google translate , Baidu translation , Sogou translation carries out open testing of Chinese-English translation on three platforms . Although there are differences in the three models , But they can basically translate the whole paragraph correctly , The translation of conjunctions in long difficult sentences is also more accurate , Basically done “ Letter 、 reach 、 Jas. ” Medium “ Letter ” word , This shows the power of deep learning . For this passage , Sogou translates relatively best , You might as well experience the current level of machine translation ：
But the beauty is , Some of the more difficult problems are still not handled well . For example, all three translation platforms will “ Only two ” The word is mistranslated into “only”. It is estimated that the translation model has not been seen in the training corpus “ Only two ” Translation method , So I had to find the closest word “ only ”, Translate into “only”.
Try another difficult example ：“ The river in front of my house is very sad ”, All three platforms put “ sorry ” The word was mistranslated into “sad”.
Finally, take a look at the pioneer of machine translation research in the world Yehoshua Bar-Hillel In its 1960 Given in the famous article on the judgment of the development prospect of machine translation published in 、 Seemingly very simple classical translation difficult sentences ：“The box was in the pen”, All three platforms are incorrectly translated into “ The box is in the pen ”. In the real world ,pen There are actually two meanings , One is a pen , The second is the fence . To translate the word correctly , The machine needs to know box And pen The size of the relationship , And prepositions in Deep semantic information such as the meaning of . This involves all inclusive world knowledge .
From the above case studies, we can see ： Machine translation needs the systematic intervention of semantic knowledge and even world knowledge to deal with difficult translation , Automatic high quality machine translation , At present, we can't do . Natural language processing task based on deep learning technology , It mainly uses a large-scale corpus , At present, there is no better method to solve the problem of deep semantic understanding in natural language processing .
The previous generation's rationalist approach , Trying to construct a grammar rule set manually 、 Solve the translation problem under the condition of serious lack of semantic formalization , This practice has been verified by practice and basically does not work ; The existing deep neural networks mainly rely on “ raw ” Bilingual corpus , Try to find some correspondence or association rules from the corpus , Without deep semantic analysis —— This is also the biggest advantage of deep learning .
However , As the saying goes “ Cheng Yi Xiao He , Xiao He also defeated ”, Using the method of deep learning to complete machine translation , essentially , It doesn't really understand this sentence from the perspective of deep semantics . This is what it was born with “ The heel of Achilles ”： It doesn't consciously use semantic information , For words you have never met , It usually automatically selects one it has seen “ Shaped like a ” Words to guess , Encounter more complex semantic phenomena than ever seen , You can only guess by luck .
Current machine translation problems “ pas ” There is a serious shortage of available systematic world knowledge , At the same time, there is a lack of effective means of semantic analysis .
Two 、 Large scale pre training language model
From early machine translation , Up to now with BERT and GPT-3 A large-scale pre training language model , Natural language processing technology based on deep learning , It has become the strategic focus and hotspot of technology competition in the whole field of artificial intelligence all over the world , It's also along “ Huge data 、 Maximal model 、 Maximum computing power ” The orbit of ,“ resort to every conceivable means ” All the way .
without doubt , Large scale pre training language model , It is a very important public basic resource of language information . With the development of deep learning , At present, both academia and industry need such a public basic resource . Its biggest advantage is that it can connect all the language information on the Internet , So that when we deal with specific tasks , Not based on “ A wasteland ”, But based on the land that has been preliminarily cultivated in all directions . This work is undoubtedly very important , Its role is universal and indispensable .
At the same time, we should also pay attention to , Large scale pre training language model “ Inclusive ”, In essence, it's a kind of “ extensive reading ”, similar “ Jack of all trades ”, So there should be “ Extensive but not refined ” Deficiency , Although it has an effect on each language processing specific task , But the actual effect may be “ unclear ”, Not necessarily ideal .
Although many papers claim to pass few-shot Model migration can be realized , However, it is believed that a certain scale training data set specially for specific tasks can be used to fine tune the large-scale pre training language model , The actual effect should be better .
There are still some unclear questions , We need to find out through research , such as , Those corpora that have nothing to do with a specific task （ It is conceivable that this part of the corpus will be many times larger than the relevant corpus ） It has been used to train large-scale pre training language models , Is it cost effective （ Consume or occupy too many kinds of computing resources ）？ Will too much noise be introduced to significantly degrade the performance of the system for specific tasks ？
The biggest problem faced by large-scale pre training language model is ： scale （ Data 、 Model 、 Calculate the force ） How far can the perfection of ？ Many famous institutions , Such as Baidu 、 Beijing Zhiyuan Artificial Intelligence Research Institute, etc , Are trying to push the scale to the extreme , From an engineering point of view , Perfection is of practical significance . In fact, only one such model is needed , If everyone can use , That's all right. , You don't have to make one for everyone .
But at the same time, many scholars have questioned the scientific significance of the maximization of scale . From a research perspective , How far can perfection go is a question mark . People may expect quantitative change to cause qualitative change , however , The premise of qualitative change caused by quantitative change is that there is a reasonable structure or mechanism in the model . Otherwise, it's like casting pearls before swine , No matter how long you play , Cows can't understand music . Large scale pre training language model is likely to encounter this bottleneck , After the quantity changes to a considerable extent , The trend of its performance gain will tend to be flat .
For the current large-scale pre training language model , Such as GPT-3, Although almost all human texts have been introduced , However, the ability to control semantics is actually insufficient , For example, a group of sentences generated by a typical large-scale pre training language model ： Input “ Follow the overcrowded mountain path all the way , Not seen ”, The model will continue with “ anybody ”. This reflects the essential defects of large-scale pre training language model .
Insufficient semantic control will make the text generated by the model appear wordy （ Especially the growth text ）, The logical relationship of language is specious , Can't stand a little deliberation . be based on GPT-3 The text generation model is still called “ Statistical parrot ”.
The main challenges that large-scale pre training language models need to overcome are machine translation “ pas ” It's exactly the same ： There is a serious shortage of available systematic world knowledge , At the same time, there is a lack of effective means of semantic analysis .
Throughout the development of natural language processing , When we looked around , It seems noisy , Various technologies emerge in endlessly , have a lot of “ The flowers are getting more and more attractive ” Trend , But in the depth of scientific research , Still stay in “ A light grass has no hoof ” The position of . Application of deep neural network in natural language processing “ The heel of Achilles ”—— The construction and application of large-scale semantic and world knowledge remain to be solved .
Natural language processing is currently or is in a state of “ Go to the water limit ” The stage of . This is also a historic node for the development of the next generation of deep learning . At this time, a kind of “ Sit and watch the clouds rise ” The attitude of , We should actively explore ways to break the situation by improving the theoretical height and profundity of the research , In order to go further .