当前位置:网站首页>Random forest

Random forest

2021-01-23 19:54:58 Light as breeze

The main idea of random forest is First, randomly select the original sample ( There's put back sampling , and Bagging  equally ,bootstrap There's no need for cross validation )N A training subset is used to randomly generate N A decision tree , For each sample set, the optimal attribute is randomly selected when constructing the decision tree m Attributes , Instead of using all the attributes in the decision tree , And then these decision trees form a forest , There is no correlation between every decision tree of random forest . After getting the forest , When there's a new loser When the sample comes in , Let each decision tree in the forest make a judgment , Let's see what kind of sample this sample should belong to , And then we'll see which category is the most chosen , Just predict what kind of sample this sample is .

Random forest is composed of several random decision trees , yes bagging It's an enhancement algorithm , Parallel does not interfere with each other , The base learner is the classification decision tree , Random forest has two random samples 1 Training samples are randomly selected 2 The attributes of each decision tree are randomly selected (1 Randomly generated from the original training set N A training subset is used to randomly generate N A decision tree ,2 In the process of constructing the decision tree for each sample set, randomly select m Attributes ).

Advantages of random forest :

*   Because the sample is random ( The model has strong generalization ability , For missing values 、 The outliers are not sensitive ) And random selection of attributes ( Can handle high dimensional data ), Avoid over fitting ;

*   The trees are independent of each other , Parallelizable training , It's fast to train models

*   Models can handle imbalanced data , Balance error

*   The end result of the training , You can sort features , Choose more important features

*   Sampling in random forest algorithm is based on bootstrap sampling , Yes OOB Set ( Some data may not be selected ), There is no need for cross validation or separate test sets to get test sets The disadvantages of random forests :

*   When the data noise is large , Over fitting will occur

*   because 2 A random , It's almost impossible to control the operation inside the model , It's like a black box , Only try between different parameters and random seeds .

版权声明
本文为[Light as breeze]所创,转载请带上原文链接,感谢
https://chowdera.com/2021/01/20210123195436769u.html

随机推荐