2020-11-06 01:14:27

author |PHANI8 compile |VK source |Analytics Vidhya

### Introduce

In this article , We'll see what a real gradient descent is , Why it became popular , Why? AI and ML Most of the algorithms in follow this technique .

Before we start , What the gradient actually means ？ That sounds strange, right ！

Cauchy is 1847 The first person to propose gradient descent in

Um. , The word gradient means the increase and decrease of a property ！ And falling means moving down . therefore , in general , The act of descending to a certain point and observing and continuing to descend is called gradient descent therefore , Under normal circumstances , As shown in the figure , The slope of the top of the mountain is very high , Through constant movement , When you get to the foot of the mountain, the slope is the smallest , Or close to or equal to zero . The same applies mathematically .

Let's see how to do it therefore , If you see the shape here is the same as the mountains here . Let's assume that this is a form of y=f（x） The curve of .

Here we know , The slope at any point is y Yes x The derivative of , If you use a curve to check , You'll find that , When you move down , The slope decreases at the tip or minimum and equals zero , When we move up again , The slope will increase

Remember that , We're going to look at the smallest point x and y What happens to the value of ,

Look at the picture below , We have five points in different positions ！

![](http://qiniu.aihubs.net/61300Screenshot (123).png)

When we move down , We will find that y The value will decrease , So in all the points here , We get a relatively minimum value at the bottom of the graph . therefore , Our conclusion is that we always find the minimum at the bottom of the graph （x,y）. Now let's take a look at how ML and DL Pass this , And how to reach the minimum point without traversing the whole graph ？

In any algorithm , Our main purpose is to minimize the loss , This shows that our model works well . To analyze this , We're going to use linear regression Because linear regression uses straight lines to predict continuous output -

Let's set a straight line y=w*x+c

Here we need to find w and c, In this way, we have the best fitting line to minimize the error . So our goal is to find the best w and c value

Let's start with some random values w and c, We update these values based on the loss , in other words , We update these weights , Until the slope is equal to or close to zero .

We will take y The loss function on the axis ,x There's... On the shaft w and c. Look at the picture below -

![](http://qiniu.aihubs.net/47460Screenshot (124).png)

In order to achieve the minimum in the first graph w value , Please follow these steps -

1. use w and c Start calculating a given set of x _values The loss of .

2. Draw points , Now update the weight to -

w_new =w_old – learning_rate * slope at (w_old,loss)

Repeat these steps , Until the minimum value is reached ！

• We subtract the gradient here , Because we want to move to the foot of the mountain , Or moving in the steepest direction of descent

• When we subtract , We're going to get a smaller slope than the previous one , This is where we want to move to a point where the slope is equal to or close to zero

• We'll talk about the learning rate later

The same applies to pictures 2, Loss and c Function of Now the question is why we put learning rate in the equation ？ Because we can't traverse all the points between the starting point and the minimum

We need to skip a few points

• We can take big steps at the beginning .

• however , When we're close to the minimum , We need to take small steps , Because we're going to cross the minimum , Move to a slope to add . In order to control the step size and movement of the graph , The introduction of learning rate . Even if there is no learning rate , We'll also get the minimum , But what we care about is that our algorithms are faster !!

![](http://qiniu.aihubs.net/59180Screenshot (125).png)

Here is an example algorithm for linear regression using gradient descent . Here we use the mean square error as the loss function -

1. Initialize model parameters with zero

m=0,c=0

2. Use （0,1） Any value in the range initializes the learning rate

lr=0.01

The error equation -

![](http://qiniu.aihubs.net/43480Screenshot (128).png)

Now use （w*x+c） Instead of Ypred And calculate the partial derivative

![](http://qiniu.aihubs.net/12675Screenshot (129).png)

3.c It can also be calculated that

![](http://qiniu.aihubs.net/38784Screenshot (130).png)

4. Apply this to all epoch Data set of

``````for i in range(epochs):
y_pred = w * x +c
D_M = (-2/n) * sum(x * (y_original - y_pred))
D_C = (-2/n) * sum(y_original - y_pred)
``````

Here the summation function adds the gradients of all points at once ！

Update parameters for all iterations

W = W – lr * D_M

C = C – lr * D_C

Gradient descent method is used for deep learning of neural networks … ad locum , We update the weights of each neuron , In order to get the best classification with minimum error . We use gradient descent to update the ownership value of each layer …

Wi = Wi – learning_rate * derivative (Loss function w.r.t Wi)

### Why it's popular ？

Gradient descent is the most commonly used optimization strategy in machine learning and deep learning .

It's used to train data models , It can be combined with various algorithms , Easy to understand and implement

Many statistical techniques and methods use GD To minimize and optimize their process .