当前位置:网站首页>UIUC | course learning for language model

UIUC | course learning for language model

2021-08-08 15:44:42 Author: Ke

【 Paper title 】Curriculum learning for language modeling

【 The author team 】 Daniel Campos

【 Time of publication 】2021/08/04

【 machine structure 】UIUC

【 Thesis link 】https://arxiv.org/pdf/2108.02170v1.pdf

【 Code link 】https://github.com/spacemanidol/CurriculumLearningForLanguageModels

【 Recommended reasons 】 A negative case of course learning and pre training

image ELMo and BERT Such a language model serves as a language understanding component for various downstream tasks , Provides a powerful natural language representation . Course learning is a method of adopting a structured training system , It is used in computer vision and machine translation to improve model training speed and model performance . Although the language model has been proved to be of revolutionary significance to the field of natural language processing , But these models have proven to be expensive 、 Energy intensive and challenging training . In this work , We explored the impact of course learning on language model pre training , Courses using various linguistic motivations , And evaluated GLUE Benchmark migration performance . Although there are various training methods and experiments , We have found no convincing evidence that curriculum learning methods can improve language model training .

The figure above shows the course learning CBC(competence based curriculum ) Algorithm . corpus X It's a sample S Set , Each of these samples si Is a sequence of words , Sort by difficulty, using a heuristic method , Such as sentence length or the rarity of a single sentence , Assigned a difficulty . A model is assigned an initial capability λ0 And a capability increment λ increment, The ability score of a model represents the progress of the model in the training process . In each training step , The model is from below its current capability λt Sample from the data , Update its weight , And increase its ability λt.

This paper explores the of sample difficulty 8 A proxy indicator : No course 、 Random 、 Sample length 、 Word sample probability 、 Large sample probability 、 Three word sample probability 、 Discourse diversity (POS) And sample dependency resolution complexity (DEP).

The picture above shows the wikitext-2( Small ) The result on . We find no strong evidence that the structure of the curriculum is important , Because it's not a course (λ0=1) Is better than others 4 Courses and baselines perform better . Perhaps the most surprising thing is , Although there is no formal structure in the training system , But in general glue Score to measure , Randomization performed better than baseline . When observing the variability of a single task , We found that only CoLA、STS-B and SST There is a wide range of variability in performance . We think this is because these missions are small , More challenging in language .

The picture above shows the wikitext-3( Big ) The result on . We found out wikitext-2 The trend found in is not tenable , Because the highest performance is achieved by the baseline model . We also note that , The ranking of system performance does not hold on different data sets , And with the increase of pre training data set , The differences between models are also decreasing . Similar to smaller corpora , We found that ColA The highest sensitivity , And found SST and STS-B The variability of becomes softer .

Conclusion :

  • In our work , We have found no strong evidence that using curriculum learning can improve the pre training of language model . Our work is based on CBC The training mechanism can not learn the good representation of the training corpus , But their characterization can migrate well downstream NLP Tasks . We found that , When the pre training corpus is small ,CBC The method can be better than random sampling , But with the expansion of corpus , This advantage will disappear . Besides , We found no evidence that any type of heuristic difficulty is right CBC It is more appropriate for .

版权声明
本文为[Author: Ke]所创,转载请带上原文链接,感谢
https://chowdera.com/2021/08/20210808154024848U.html

随机推荐