当前位置:网站首页>Normalization and standardization of feature preprocessing

Normalization and standardization of feature preprocessing

2021-01-23 23:02:45 noor9

Write it at the front

The main content of this blog

  • application MinMaxScaler Realize the normalization of feature data
  • application StandardScaler Realize the standardization of feature data

Feature preprocessing

Definition

​ adopt Some conversion functions Integrate feature data Convert to a more suitable algorithm model The characteristic data process of

Feature preprocessing API

sklearn.preprocessing

Why normalization / Standardization ?

​ The characteristics of the The unit or size varies greatly , Or the variance of a feature is several orders of magnitude larger than that of other features , Easy to influence ( control ) Target result , Some algorithms cannot learn other features

normalization

Definition

​ Map data to by transforming the original data ( The default is [0,1]) Between

 Normalization formula

Act on each column ,max Is the maximum value of a column ,min Is the minimum value of a column , that X’’ For the end result ,mx,mi Default for the specified interval value mx by 1,mi by 0

API

  • sklearn.preprocessing.MinMaxScaler (feature_range=(0,1)… )
    • MinMaxScalar.fit_transform(X)
      • X:numpy array Formatted data [n_samples,n_features]
    • Return value : The transformed shape is the same array

data

milage,Liters,Consumtime,target
40920,8.326976,0.953952,3
14488,7.153469,1.673904,2
26052,1.441871,0.805124,1
75136,13.147394,0.428964,1
38344,1.669788,0.134296,1

Code

from sklearn.preprocessing import MinMaxScaler
import pandas as pd
def minmax_demo():
    data = pd.read_csv("dating.txt")
    print(data)
    # 1、 Instantiate a converter class 
    transfer = MinMaxScaler(feature_range=(2, 3))
    # 2、 call fit_transform
    data = transfer.fit_transform(data[['milage','Liters','Consumtime']])
    print(" The result of normalization of minimum and maximum values :\n", data)

    return None

result

 Normalized running results

Standardization

Definition

​ Transform the original data to mean value 0, The standard deviation is 1 Within the scope of

 Standardized formula

Act on each column ,mean Is the average ,σ As the standard deviation

API

  • sklearn.preprocessing.StandardScaler( )
    • After processing, all data in each column is clustered in the mean value 0 The standard deviation is 1
    • StandardScaler.fit_transform(X)
      • X:numpy array Formatted data [n_samples,n_features]
    • Return value : The transformed shape is the same array

data

​ Same as the data used in the introduction

Code

from sklearn.preprocessing import StandardScaler
import pandas as pd
def stand_demo():
    data = pd.read_csv("dating.txt")
    print(data)
    transfer = StandardScaler()
    data = transfer.fit_transform(data[['milage','Liters','Consumtime']])
    print(" The result of Standardization :\n",data)
    print(" The average value of each column of features :\n",transfer.mean_)
    print(" The variance of each column characteristic :\n",transfer.var_)
    return None

Running results

 Standardized operation results

版权声明
本文为[noor9]所创,转载请带上原文链接,感谢
https://chowdera.com/2021/01/20210123230205352y.html

随机推荐