## Write it at the front

• application MinMaxScaler Realize the normalization of feature data
• application StandardScaler Realize the standardization of feature data

## Feature preprocessing

### Definition

​ adopt Some conversion functions Integrate feature data Convert to a more suitable algorithm model The characteristic data process of

### Feature preprocessing API

``````sklearn.preprocessing
``````

### Why normalization / Standardization ？

​ The characteristics of the The unit or size varies greatly , Or the variance of a feature is several orders of magnitude larger than that of other features , Easy to influence （ control ） Target result , Some algorithms cannot learn other features

### normalization

#### Definition

​ Map data to by transforming the original data ( The default is [0,1]) Between

Act on each column ,max Is the maximum value of a column ,min Is the minimum value of a column , that X’’ For the end result ,mx,mi Default for the specified interval value mx by 1,mi by 0

#### API

• sklearn.preprocessing.MinMaxScaler (feature_range=(0,1)… )

• MinMaxScalar.fit_transform(X)

• X:numpy array Formatted data [n_samples,n_features]
• Return value ： The transformed shape is the same array

#### data

``````milage,Liters,Consumtime,target
40920,8.326976,0.953952,3
14488,7.153469,1.673904,2
26052,1.441871,0.805124,1
75136,13.147394,0.428964,1
38344,1.669788,0.134296,1
``````

#### Code

``````from sklearn.preprocessing import MinMaxScaler

def minmax_demo():
print(data)
# 1、 Instantiate a converter class
transfer = MinMaxScaler(feature_range=(2, 3))
# 2、 call fit_transform
data = transfer.fit_transform(data[['milage','Liters','Consumtime']])
print(" The result of normalization of minimum and maximum values ：\n", data)

return None
``````

### Standardization

#### Definition

​ Transform the original data to mean value 0, The standard deviation is 1 Within the scope of

Act on each column ,mean Is the average ,σ As the standard deviation

#### API

• sklearn.preprocessing.StandardScaler( )

• After processing, all data in each column is clustered in the mean value 0 The standard deviation is 1
• StandardScaler.fit_transform(X)
• X:numpy array Formatted data [n_samples,n_features]
• Return value ： The transformed shape is the same array

#### data

​ Same as the data used in the introduction

#### Code

``````from sklearn.preprocessing import StandardScaler

def stand_demo():
print(data)
transfer = StandardScaler()
data = transfer.fit_transform(data[['milage','Liters','Consumtime']])
print(" The result of Standardization ：\n",data)
print(" The average value of each column of features ：\n",transfer.mean_)
print(" The variance of each column characteristic ：\n",transfer.var_)
return None
``````

