当前位置：网站首页>The difference between gbdt and XGB, and the mathematical derivation of gradient descent method and Newton method
The difference between gbdt and XGB, and the mathematical derivation of gradient descent method and Newton method
20201106 01:28:18 【Elementary school students in IT field】
Why the gradient descent method and Newton method are introduced ？
Two algorithms are mentioned here GBDT and XGBoost, All two are boosting Model .
GBDT and xgb The objective function of is different , At the same time, aiming at the error function in the objective function L(θ) There are also differences in the way of fitting ：
 GBDT Let's use the first order Taylor expansion to expand two terms , Make an approximation
 xgboost Let's use the second order Taylor expansion to expand the three terms , Make an approximation
Words mean ,  GBDT The gradient descent method is used to optimize in function space
 XGBoost Using Newton's method to optimize in function space
The final objective function only depends on the first and second derivatives of each data point on the error function .
The error function can be customized , Let's say the square loss function ： $(yi,y_{i})=(yi−y_{i})2$, or logistic Loss function
More introductions and suggestions to listen to ：https://study.163.com/course/courseMain.htm?courseId=1006401020&share=2&shareId=400000000645014
1. The derivation of the gradient descent method
Gradient descent method is widely used in machine learning and deep learning , The general course or textbook will explain the gradient descent method in a visualized way （ Quadratic curve 、 Lower convex surface, etc ）, I think we all know how to use visualization to illustrate the effectiveness of gradient descent method . here , I'm not going to repeat this kind of figurative explanation . I use mathematical derivation to prove the effectiveness of the gradient descent method .
We should all know the Taylor expansion of function of one variable , The formula is as follows ：
$f(x)=f(x_{0})+1!f’(x_{0}) (x−x_{0})+2!f’’(x_{0}) (x−x_{0})_{2}+3!f’’’(x_{0}) (x−x_{0})_{3}+…$
Let's just take the first two terms of the formula on the right , That's one “ About equal to ” Result ：
$f(x)=f(x_{0})+1!f’(x_{0}) (x−x_{0})$
remember ：$Δx=x−x_{0}$, The above formula becomes ：
$f(x)=f(x_{0})+1!f’(x_{0}) Δx$
Our goal is to make the following equation hold in the iterative process , In other words, it is hoped that the function value will gradually decrease during the iteration process , To describe in mathematical language is ：$f(x_{n+1})≤f(x_{n})$
Easy to think of , It should be constructed ：
$Δx=−f’(x_{0})$
here ：
$f(x)=f(x_{0})−f’(x_{0})_{2}$
Write it in iterative form ：
$f(x_{n+1})=f(x_{n})−f’(x_{n})_{2}$
because $f’(x)_{2}≥0$, We have completed the proof of the effectiveness of gradient descent . The formula of parameter iterative updating summarized from the above steps is as follows ：
$f(x_{n+1})=f(x_{n})−f’(x_{n})_{2}$
The above steps prove the effectiveness of gradient descent on a function of one variable . It is easy to generalize to multivariate functions . in addition , In multivariate functions , It can also be added that the gradient direction is the fastest direction of descent .
See ： You know Why does gradient descent find a minimum ？
2. Newton method
That's the gradient descent method , By the way, the derivation of Newton's method . Because Newton's method is also derived from Taylor expansion .
Keep watching Taylor unfold ：
$f(x)=f(x_{0})+1!f’(x_{0}) (x−x_{0})+2!f’’(x_{0}) (x−x_{0})_{2}+3!f’’’(x_{0}) (x−x_{0})_{3}+…$
still , Let's take the right front 2 term ：
$f(x)=f(x_{0})+1!f’(x_{0}) (x−x_{0})$
Take derivatives on both sides of the equation ：
$f’(x)=f’(x_{0})+1!f’’(x_{0}) (x−x_{0})$
$f’(x)=f’(x_{0})+1!f’’(x_{0}) Δx$
According to the nature of calculus ,$f(x)$ When you take the minimum , Yes $f’(x)=0$, Let's put this property into the equation above , Yes ：
$0=f’(x_{0})+1!f’’(x_{0}) Δx$
$Δx=−f’’(x_{0})f’(x_{0}) $
In this way, we get the parameter iterative updating formula of Newton method as follows ：
$x_{n+1}=x_{n}−f’’(x_{n})f’(x_{n}) $
3. The difference between the gradient descent method and Newton's method
From the above proof process, we can see that , Although both gradient descent method and Newton method can be derived by Taylor expansion , But there's a little bit of a difference in the way the reasoning is based .
In practice , Newton method and gradient descent method are widely used in machine learning . The difference between the two is actually written in many blogs , such as ： gradient descent or Quasi Newton method ？
4. Quasi Newton method
In the parameter iterative updating formula of Newton method above , We can see $f’’(x_{0})$ It's in the denominator . remember , The mathematical derivation above is a function of one variable , For multivariate functions , The existence of this denominator is equivalent to calculating Hessian The inverse of the matrix , It's very difficult and timeconsuming . therefore , Many variations of Newton's algorithm have appeared , This kind of deformation is called quasi Newton algorithm .BFGS It's an iterative method to approximate the Hessian matrix . and BFGS We need to store the approximate Hessian matrix , So there's an improved version LBFGS.
版权声明
本文为[Elementary school students in IT field]所创，转载请带上原文链接，感谢
边栏推荐
 C++ 数字、string和char*的转换
 C++学习——centos7上部署C++开发环境
 C++学习——一步步学会写Makefile
 C++学习——临时对象的产生与优化
 C++学习——对象的引用的用法
 C++编程经验（6）：使用C++风格的类型转换
 Won the CKA + CKS certificate with the highest gold content in kubernetes in 31 days!
 C + + number, string and char * conversion
 C + + Learning  capacity() and resize() in C + +
 C + + Learning  about code performance optimization
猜你喜欢

C + + programming experience (6): using C + + style type conversion

Latest party and government work report ppt  Park ppt

在线身份证号码提取生日工具

Online ID number extraction birthday tool

️野指针？悬空指针？️ 一文带你搞懂！

Field pointer? Dangling pointer? This article will help you understand!

HCNA Routing＆Switching之GVRP

GVRP of hcna Routing & Switching

Seq2Seq实现闲聊机器人

【闲聊机器人】seq2seq模型的原理
随机推荐
 LeetCode 91. 解码方法
 Seq2seq implements chat robot
 [chat robot] principle of seq2seq model
 Leetcode 91. Decoding method
 HCNA Routing＆Switching之GVRP
 GVRP of hcna Routing & Switching
 HDU7016 Random Walk 2
 [Code+＃1]Yazid 的新生舞会
 CF1548C The Three Little Pigs
 HDU7033 Typing Contest
 HDU7016 Random Walk 2
 [code + 1] Yazid's freshman ball
 CF1548C The Three Little Pigs
 HDU7033 Typing Contest
 Qt Creator 自动补齐变慢的解决
 HALCON 20.11：如何处理标定助手品质问题
 HALCON 20.11：标定助手使用注意事项
 Solution of QT creator's automatic replenishment slowing down
 Halcon 20.11: how to deal with the quality problem of calibration assistant
 Halcon 20.11: precautions for use of calibration assistant
 “十大科学技术问题”揭晓！青年科学家50²论坛
 "Top ten scientific and technological issues" announced Young scientists 50 ² forum
 求反转链表
 Reverse linked list
 js的数据类型
 JS data type
 记一次文件读写遇到的bug
 Remember the bug encountered in reading and writing a file
 单例模式
 Singleton mode
 在这个 N 多编程语言争霸的世界，C++ 究竟还有没有未来？
 In this world of N programming languages, is there a future for C + +?
 es6模板字符
 js Promise
 js 数组方法 回顾
 ES6 template characters
 js Promise
 JS array method review
 【Golang】️走进 Go 语言️ 第一课 Hello World
 [golang] go into go language lesson 1 Hello World