GBDT So the decision tree （CART） Based on the learner's GB Algorithm ,xgboost Expand and improve GDBT,xgboost Algorithm faster , The accuracy is also relatively high .
GBDT The steps of the algorithm ： Based on the residuals from the previous iteration （ residual ： The actual value of the sample – Sample forecast , The initial value is the sample value ）,GBDT It's based on residual （xgboost According to the first derivative G Second derivative and H） Come on Training based learners , Calculate the weight of the basic learning tool , Update sample residuals and corresponding weights , Until the cost function converges . When prediction samples come in , In turn, the predicted values of each base learner are added as the whole prediction result .
1. Initialize the model to a constant value ：
2. Iterative generation M A basic learner
1. Calculate the pseudo residual, that is, the weight of the iteration sample
2. be based on Generative base learners
3. Calculate the optimal That is, the iterative base learner The weight of
4. Update the model
xgboost And gbdt The big difference is the definition of the objective function . xgboost Use Taylor to expand three items , Make an approximation , The final objective function only depends on the first and second derivatives of each data point on the error function .
Through to Derivation is equal to 0, You can get
And then put The optimal solution is substituted into the objective function to get the objective optimization
The next step is to select the feature splitting node when we construct the tree
1） The greedy method of enumerating all different tree structures
From the depth of the tree 0 Start , Each node traverses all the features , For a feature , First sort by the value in the feature , Then calculate in turn, assuming that the tree is divided into left and right subtrees according to the whole characteristic value , The target values corresponding to the left and right subtrees are calculated respectively ,
It's worth subtracting the goal before the division （ Similar information gain , By subtracting the entropy before splitting from the accumulated entropy after splitting, the information gain of this splitting is obtained ）
Choose the largest value as the best segmentation point , Finally, all features are segmented .
The specific algorithm is as follows ：
2） The approximate algorithm
It's mainly about the big data , You can't calculate directly .
When looking for the best segmentation point , The traditional greedy method which considers all possible segmentation points of each feature is inefficient ,xgboost An approximate algorithm is implemented . The general idea is to list several candidates who may become the segmentation point according to the percentile method , Then from the candidates according to the above formula to find the best segmentation point .
xgboost And gdbt Similarities and differences
Xgboost yes gdbt Efficient improvement of the algorithm , All are boost It's an enhancement algorithm （ Serial strong upstream downstream dependency ）,xgboost The basic learner in can be CART Back to the tree It can also be a linear classifier , The regularization term is added to the objective function （ Related to the leaf node and the value of the leaf node ） Avoid overfitting , The first three approximations of Taylor expansion are used in target calculation , Nodes at the same level support parallelization , When selecting the node partition, an approximate algorithm is provided to reduce the amount of calculation ( According to the characteristic percentile method, list several candidates who may become the segmentation point, and then find the optimal partition ).