当前位置:网站首页>How to make deep learning model more universal?

How to make deep learning model more universal?

2021-06-21 19:09:37 InfoQ

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"size","attrs":{"size":10}},{"type":"strong"}],"text":" This article was originally published in towards data science Website , Authorized by the original author InfoQ Translate and share ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Constant risk minimization (Invariant Risk Minimization,IRM) It's an exciting new learning paradigm , It can help the generalization level of prediction model surpass the limitation of training data . It consists of Facebook Researchers at the University of California developed , And in 2020 An article in "},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/pdf\/1907.02893.pdf","title":"","type":null},"content":[{"type":"text","text":" The paper "}]},{"type":"text","text":" It is introduced in . This approach can be added to almost any modeling framework , But it's best suited to black box models that use a lot of data ( Various neural networks and their variants )."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" In this paper , Let's learn more about it ."}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":" Technology Overview "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" At a high level ,IRM It's a learning paradigm , It's trying to learn causality rather than correlation . Through the development of training environment and structured data samples and other means , We can improve the accuracy as much as possible , At the same time, the invariance of prediction variables is guaranteed . It's suitable for our data , Predictive variables that remain unchanged in various environments are used as the output of the final model ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/b4\/48\/b40bdbaccea73c22693c3fda0fbe8548.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}},{"type":"strong"}],"text":" chart 1:4-foldCV( Top ) With constant risk minimization (IRM)( Bottom ) The theoretical performance comparison of . These values are derived from "},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/pdf\/1907.02893.pdf","title":"","type":null},"content":[{"type":"text","text":" The paper "}],"marks":[{"type":"size","attrs":{"size":10}},{"type":"strong"}]},{"type":"text","marks":[{"type":"size","attrs":{"size":10}},{"type":"strong"}],"text":" From the simulation in ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":" The first 1 Step : Develop your environment set "},{"type":"text","text":". We didn't rearrange the data and assume that they were IID, Instead, it uses knowledge related to the data selection process to develop multiple sampling environments . for example , For a model that parses text in an image , Our training environment can be grouped by the author who wrote the text ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":" The first 2 Step : Minimize cross environmental losses "},{"type":"text","text":". After the development environment , We will fit approximately constant predictors and optimize our accuracy across environments . For more information, see the following ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":" The first 3 Step : Better generalization "},{"type":"text","text":"! The risk invariant minimization method shows a higher distribution than the traditional learning paradigm (out-of-distribution,OOD) accuracy ."}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":" What's going on ?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Let's stop first , To understand the actual working mechanism of constant risk minimization ."}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":" What do predictive models do ?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" First , The purpose of the prediction model is to generalize , That is to say, we can get good performance on the data we haven't seen before . We call data that we haven't seen outside the distribution (OOD)."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" To simulate new data , The industry has introduced a variety of approaches ( Such as "},{"type":"link","attrs":{"href":"https:\/\/towardsdatascience.com\/cross-validation-430d9a5fee22","title":"","type":null},"content":[{"type":"text","text":" Cross validation "}]},{"type":"text","text":"). Although this approach is better than a simple training set , But we are still limited by the data we observe . that , Can you make sure that this model will be generalized ?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Um. , Generally speaking, you can't ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" For some well-defined problems ( You have a good understanding of the data generation mechanism ), We can be sure that our data sample represents the population . But for most application types, we can't be so sure ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Take an example cited in the paper . We want to judge whether the animal in a picture is a cow or a camel ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/91\/98\/91e363cdd27dd8fd1d0fca23bb5b3c98.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" So , We use cross validation to train a binary classifier , It is observed that the model achieves high accuracy in our test data . very good !"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" However , After more exploration , We found that our classifier simply uses the color of the background to determine whether the image is a cow or a camel ; When a cow is placed on a sandy background , The model always thinks it's a camel , vice versa ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Now? , Can we assume that cows are always observed only on the ranch , And camels are only observed in the desert ?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Obviously not. . Although this is a small example , But we can see that similar situations affect more complex and important models ."}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":" Why is the current method not enough ?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Before delving into the solution , Let's take a step forward to understand why popular training \/ The test learning paradigm is not enough ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Classic training \/ The test paradigm is called empirical risk minimization in this paper (Empirical Risk Minimization ,ERM). stay ERM in , We put the data into training \/ Test focus , Training models on all features , Use test sets for validation , And return with the best test ( Out of sample ) Accurate fitting model . One example is 50\/50 Split the training test ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Now? , To understand why ERM It can't be generalized well , Let's take a look at its three main hypotheses :"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":" Our data is independent and identically distributed (IID)."}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":" As we gather more data , Sample size n The ratio to the number of salient features should decrease ."}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":" There is only one that can be achieved with perfect training accuracy ( Buildable ) Model time , The perfect test accuracy ."}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" At first glance , All three hypotheses seem to hold . But the reality is often the opposite ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Look at our first hypothesis , Our data is almost never real IID. In practice , Data collection almost always introduces relationships between data points . for example , All images of camels in the desert have to be taken somewhere in the world ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Now there's a lot of data “ very ”IID The situation of , But the important thing is , Think critically about whether and how your data collection introduces bias ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" hypothesis #1: If our data is not IID, So the first assumption is invalid , We can't scramble our data at random . It's important to consider whether your data generation mechanism introduces bias ."}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" For our second hypothesis , If we're modeling causality , We expect the number of salient features to remain stable after a certain number of observations . let me put it another way , As we collect more high-quality data , We will be able to find out the true causal relationships and map them perfectly , So more data won't improve our accuracy ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" But for the ERM For example, this rarely happens . Because we can't determine whether a relationship is causal , So more data usually fit more false correlations . This phenomenon is called "},{"type":"link","attrs":{"href":"https:\/\/towardsdatascience.com\/understanding-the-bias-variance-tradeoff-165e6942b229","title":"","type":null},"content":[{"type":"text","text":" prejudice - Variance tradeoff "}]},{"type":"text","text":"."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" hypothesis #2: When using ERM When fitting , The number of salient features may increase as our sample size increases , So that our second hypothesis doesn't work ."}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Last , Our third assumption is that we have the ability to build a “ perfect ” Model of . If we lack data or powerful modeling techniques , This assumption will not work . However , Unless we know it's impossible , Otherwise we always assume that it works ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" hypothesis #3: We assume that a large enough data set can achieve the optimal model , So suppose #3 establish ."}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" The paper also discusses some non ERM Method , But for various reasons , They also have shortcomings ."}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":" Solution : Constant risk minimization "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" The solution proposed in this paper is called constant risk minimization (IRM), It overcomes all the problems listed above .IRM It's a learning paradigm , Causal predictors can be estimated from multiple training environments . and , Because we learn from different data environments , We are more likely to generalize to new OOD Data on ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" How to do this ? We take advantage of the notion that causality depends on invariance ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Back to our example , What we see 95% In the image of , The cow's background is grass , And the camel's background is desert , So if we fit the color of the background , Will achieve 95% The accuracy of . On the face of it , This is a very suitable option ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" However , There's a core concept in randomized controlled trials called counterfactual , The point is, if we see a counterexample of a hypothesis , So we can knock that assumption down . therefore , As long as we see a cow in the desert , We can come to the conclusion , The desert background doesn't necessarily relate to camels ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Although the strict counterfactual is a bit harsh , But we can severely punish our model for mispredicting instances in a given environment , So we can build this concept into our loss function ."}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" for example , Consider a set of circumstances , Each environment corresponds to a country . hypothesis 9\/10 Cows live on the ranch , And camels live in the desert , But in the first 10 This pattern is reversed in the class environment . When we are in the 10 When you train and observe many counter examples in this environment , The model learned that the background alone was not enough to label a cow or camel , So it reduces the significance of this predictor ."}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":" Method "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" We've seen it IRM The meaning of , Now we're in the mathematical world , Learn how to do it ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/a8\/d2\/a87275abc5a93c5a81357e487c6f29d2.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":" chart 2:"},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/pdf\/1907.02893.pdf","title":"","type":null},"content":[{"type":"text","text":" Minimize expressions "}],"marks":[{"type":"size","attrs":{"size":10}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" chart 2 Shows our optimization expression . As the sum shows , We want to minimize the sum in all training environments ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Further subdivision ,“A” Items represent our prediction accuracy in a given training environment , among phi(𝛷) Represents data transformation , For example, a logarithm or core transforms to a higher dimension .R Represents our model in a given environment e The risk function under the condition of . Please note that , The risk function is just the average of the loss function . A classic example is the mean square error (MSE)."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“B” The term is just a positive number , Used to scale our invariants . Remember we said that strict counterfactual might be too harsh ? Here we can measure the harshness . If lambda(λ) by 0, We don't care about invariance , Just optimize the accuracy . If λ It's big , We are very concerned about invariance and give penalties accordingly ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Last ,“C” and “D” The term represents the invariance of our model in the training environment . We don't need to delve into the term , But in short , our “C” The term is a linear classifier w The gradient vector of , The default value is 1.“D” Is the risk of the linear classifier w Multiply by our data transformation (𝛷). The whole term is the square distance of the gradient vector ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/pdf\/1907.02893.pdf","title":"","type":null},"content":[{"type":"text","text":" The paper "}]},{"type":"text","text":" These terms are introduced in detail , If you are curious , Please see section 3 part ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" All in all ,“A” It's the accuracy of our model ,“B” It's a positive number that measures how much we care about invariance ,“C”“D” It's the invariance of our model . If we minimize this expression , We should be able to find a model , It can only fit the causal effects found in our training environment ."}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"IRM Follow up development "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Unfortunately , This paper introduces IRM The normal form is only applicable to linear cases . By transforming our data into a high dimensional space, we can get an effective linear model , But some relationships are fundamentally nonlinear . The author of this paper leaves the nonlinear case for future research ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" If you want to follow this research , You can see the results of the following authors :"},{"type":"link","attrs":{"href":"https:\/\/scholar.google.com\/citations?user=A6qfFPkAAAAJ&hl=en","title":"","type":null},"content":[{"type":"text","text":"Martin Arjovsky"}]},{"type":"text","text":"、"},{"type":"link","attrs":{"href":"https:\/\/leon.bottou.org\/papers","title":"","type":null},"content":[{"type":"text","text":"León Buttou"}]},{"type":"text","text":"、"},{"type":"link","attrs":{"href":"https:\/\/ishaan.io\/","title":"","type":null},"content":[{"type":"text","text":"Ishaan Gulrajani"}]},{"type":"text","text":" and "},{"type":"link","attrs":{"href":"https:\/\/scholar.google.com\/citations?hl=en&user=SiCHxTkAAAAJ&view_op=list_works&sortby=pubdate","title":"","type":null},"content":[{"type":"text","text":"David Lopez-Paz"}]},{"type":"text","text":"."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" This is our approach , Not bad. ?"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":" Implementation considerations "}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Here's one PyTorch"},{"type":"link","attrs":{"href":"https:\/\/github.com\/facebookresearch\/InvariantRiskMinimization","title":"","type":null},"content":[{"type":"text","text":" package "}]},{"type":"text","text":"."}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"IRM The best fit for unknown causality . If there is a known relationship , You should consider them in the model structure . A famous example is convolutional neural networks (CNN) Convolution of ."}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"IRM It has great potential in unsupervised model and reinforcement learning . Model fairness is also an interesting application ."}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Optimization is very complicated , Because there are two minimization terms . This paper outlines a transformation that makes optimization protrude , But only in the linear case ."}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"IRM It is robust to mild model errors , Because it is differentiable in terms of the covariance of the training environment . therefore , although “ perfect ” The model is ideal , But minimization expressions are resilient to small human errors ."}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":" Link to the original text "},{"type":"text","text":":"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/towardsdatascience.com\/how-to-make-deep-learning-models-to-generalize-better-3341a2c5400c","title":"","type":null},"content":[{"type":"text","text":"https:\/\/towardsdatascience.com\/how-to-make-deep-learning-models-to-generalize-better-3341a2c5400c"}]}]}]}

版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
https://chowdera.com/2021/06/20210621171331858m.html

随机推荐