当前位置:网站首页>UIUC | course learning for language model
UIUC | course learning for language model
2021-08-08 15:44:42 【Author: Ke】
【 Paper title 】Curriculum learning for language modeling
【 The author team 】 Daniel Campos
【 Time of publication 】2021/08/04
【 machine structure 】UIUC
【 Thesis link 】https://arxiv.org/pdf/2108.02170v1.pdf
【 Code link 】https://github.com/spacemanidol/CurriculumLearningForLanguageModels
【 Recommended reasons 】 A negative case of course learning and pre training
image ELMo and BERT Such a language model serves as a language understanding component for various downstream tasks , Provides a powerful natural language representation . Course learning is a method of adopting a structured training system , It is used in computer vision and machine translation to improve model training speed and model performance . Although the language model has been proved to be of revolutionary significance to the field of natural language processing , But these models have proven to be expensive 、 Energy intensive and challenging training . In this work , We explored the impact of course learning on language model pre training , Courses using various linguistic motivations , And evaluated GLUE Benchmark migration performance . Although there are various training methods and experiments , We have found no convincing evidence that curriculum learning methods can improve language model training .
The figure above shows the course learning CBC(competence based curriculum ) Algorithm . corpus X It's a sample S Set , Each of these samples si Is a sequence of words , Sort by difficulty, using a heuristic method , Such as sentence length or the rarity of a single sentence , Assigned a difficulty . A model is assigned an initial capability λ0 And a capability increment λ increment, The ability score of a model represents the progress of the model in the training process . In each training step , The model is from below its current capability λt Sample from the data , Update its weight , And increase its ability λt.
This paper explores the of sample difficulty 8 A proxy indicator : No course 、 Random 、 Sample length 、 Word sample probability 、 Large sample probability 、 Three word sample probability 、 Discourse diversity (POS) And sample dependency resolution complexity (DEP).
The picture above shows the wikitext-2( Small ) The result on . We find no strong evidence that the structure of the curriculum is important , Because it's not a course (λ0=1) Is better than others 4 Courses and baselines perform better . Perhaps the most surprising thing is , Although there is no formal structure in the training system , But in general glue Score to measure , Randomization performed better than baseline . When observing the variability of a single task , We found that only CoLA、STS-B and SST There is a wide range of variability in performance . We think this is because these missions are small , More challenging in language .
The picture above shows the wikitext-3( Big ) The result on . We found out wikitext-2 The trend found in is not tenable , Because the highest performance is achieved by the baseline model . We also note that , The ranking of system performance does not hold on different data sets , And with the increase of pre training data set , The differences between models are also decreasing . Similar to smaller corpora , We found that ColA The highest sensitivity , And found SST and STS-B The variability of becomes softer .
Conclusion :
- In our work , We have found no strong evidence that using curriculum learning can improve the pre training of language model . Our work is based on CBC The training mechanism can not learn the good representation of the training corpus , But their characterization can migrate well downstream NLP Tasks . We found that , When the pre training corpus is small ,CBC The method can be better than random sampling , But with the expansion of corpus , This advantage will disappear . Besides , We found no evidence that any type of heuristic difficulty is right CBC It is more appropriate for .
版权声明
本文为[Author: Ke]所创,转载请带上原文链接,感谢
https://chowdera.com/2021/08/20210808154024848U.html
边栏推荐
- Fourth in the world! Wang Sicong installed a server "readily". Netizen: trench is inhuman
- [Tencent classroom] creator zero foundation immortal practice is online!
- 跟着华为,学数字化转型(3):模式创新
- 记一次接口慢查排查
- Follow Huawei and learn digital transformation (3): mode innovation
- Record an interface slow check and troubleshooting
- @Autowired的这些骚操作,你都知道吗?
- ss -h命令
- @Do you know all these operations of Autowired?
- 使用Yolo v5进行目标检测
猜你喜欢
-
Yazid的新生舞会(线段树)
-
当creator遇上protobufjs|孕育
-
Identify and stop the process that‘s listening on port 8080 or configure this application to listen
-
为什么要推荐大家学习字节码?
-
揭秘!价值百万的像素填色解决方案,想开发绘本应用的有福了!
-
[PyTroch系列-11]:PyTorch基础 - 张量Tensor元素的排序
-
[PyTroch系列-12]:PyTorch基础 - 张量Tensor线性运算(点乘、叉乘)
-
【环境篇】第 3 节 • Navicat 环境安装
-
预训练语言模型的前世今生 - 从Word Embedding到BERT
-
讲道理,只要你是一个爱折腾的程序员,毕业找工作真的不需要再花钱培训!
随机推荐
- 华南理工 | 基于生成式的低比特无数据量化
- 微信小程序授权位置和用户信息权限(防止用户禁止后无法使用位置信息)
- 一行代码快速实现今日头条 网易新闻焦点图自动循环轮播效果
- 因果涌现:数学理论揭示整体怎样大于部分之和
- 年收入百万美元AI科学家的烦恼
- API《为什么奥运会以五色环为标志?》数据源接口
- 用一张草图创建GAN模型,新手也能玩转,朱俊彦团队新研究入选ICCV 2021
- UIUC | 用于语言模型的课程学习
- SS - H command
- Target detection using Yolo V5
- Yazid's freshman ball (thread tree)
- When creator meets protobufjs 𞓜
- 我敢肯定!你还没用过一款代码神器,只属于Creator的用户!
- 小程序页面跳转&&文章详情页的实现&&markdown格式转化为wxml显示在小程序页面里
- 49个项目管理过程ITTO整理(详细)
- 49个项目管理过程ITTO整理(详细-文字版)
- 只是想虐下春丽,一不小心撸了台游戏机...
- Cocos论坛九问九答
- Identify and stop the process that‘s listening on port 8080 or configure this application to listen
- 超详细的I/O多路复用概念、常用I/O模型、系统调用等介绍
- Why recommend learning bytecode?
- SAP Commerce Cloud UI 的用户会话管理
- 以太坊 交易 data字段 内容是什么
- SAP CRM Fiori 应用 My Note 里创建 Note 失败的一个原因分析
- 当creator遇上protobufjs|pbkiller填坑历险记
- Uncover the secret! Millions of pixel color filling solutions. Blessed are those who want to develop picture book applications!
- [pytroch series - 11]: pytorch basis - ordering of tensor tensor elements
- [pytroch series - 12]: pytorch basis tensor tensor linear operation (point multiplication, cross multiplication)
- [environment] section 3 • Navicat environment installation
- The past and present life of pre training language model - from word embedding to Bert
- Make sense, as long as you are a tossing programmer, you really don't need to spend money on training to find a job after graduation!
- South China Technology | low bit no data quantization based on generative
- Wechat applet authorizes location and user information permissions (to prevent users from being unable to use location information after prohibition)
- One line of code can quickly realize the automatic circular rotation effect of today's headlines and Netease News focus map
- Causal emergence: mathematical theory reveals how the whole is greater than the sum of parts
- The troubles of AI scientists with an annual income of millions of dollars
- API "why is the Olympic Games marked by five color rings?" Data source interface
- Create a GaN model with a sketch, which can be played by novices. The new research of Zhu Junyan's team was selected into iccv 2021
- UIUC | course learning for language model
- I'm sure! You haven't used a code artifact yet. It only belongs to creator users!