当前位置:网站首页>Normalization and standardization of feature preprocessing

Normalization and standardization of feature preprocessing

2021-01-24 02:13:37 itread01

## The main content of the previous blog - Apply MinMaxScaler Realize the normalization of feature data - Apply StandardScaler Realize the standardization of feature data ## Feature preprocessing ### Define ​ Through ** Some conversion functions ** Put the feature data ** Convert to a more algorithmic model ** The process of characteristic data of ### Feature preprocessing API```pythonsklearn.preprocessing```### Why normalize / Standardization ?​ Characteristic ** The unit or size varies greatly , Or the variance of a feature is several orders of magnitude larger than that of other features **,** Easy to influence ( control ) Target results **, Some algorithms cannot learn other features ### Normalization #### Define ​ Through the transformation of the original data, the data are mapped to ( The default is [0,1]) Between ![ Normalization formula ](https://gitee.com/xp-thebest/blog_img/raw/master/img/%E5%BD%92%E4%B8%80%E5%8C%96%E5%85%AC%E5%BC%8F.png)> Act on each column ,max Is the maximum value of a column ,min Is the minimum value of a column , So X’’ For the end result ,mx,mi Preset for the specified interval value respectively mx For 1,mi For 0#### API- sklearn.preprocessing.MinMaxScaler (feature_range=(0,1)… ) - MinMaxScalar.fit_transform(X) - X:numpy array Format information [n_samples,n_features] - Return value : The converted shape is the same array#### Information ```pythonmilage,Liters,Consumtime,target40920,8.326976,0.953952,314488,7.153469,1.673904,226052,1.441871,0.805124,175136,13.147394,0.428964,138344,1.669788,0.134296,1```#### Code ```pythonfrom sklearn.preprocessing import MinMaxScalerimport pandas as pddef minmax_demo(): data = pd.read_csv("dating.txt") print(data) # 1、 Instantiate a converter class transfer = MinMaxScaler(feature_range=(2, 3)) # 2、 call fit_transform data = transfer.fit_transform(data[['milage','Liters','Consumtime']]) print(" The result of normalization of minimum and maximum values :\n", data) return None```#### It turns out ![ Normalized execution results ](https://gitee.com/xp-thebest/blog_img/raw/master/img/image-20210123224728916.png)### Standardization #### Define ​ Through the transformation of the original data, the data is transformed to the mean value of 0, The standard deviation is 1 In scope ![ Standardized formula ](https://gitee.com/xp-thebest/blog_img/raw/master/img/%E6%A0%87%E5%87%86%E5%8C%96%E5%85%AC%E5%BC%8F.png)> Act on each column ,mean Is the average value ,σ Is the standard deviation #### API- sklearn.preprocessing.StandardScaler( ) - After processing, all the data in each column are gathered in the mean value 0 The standard deviation is 1 - StandardScaler.fit_transform(X) - X:numpy array Format information [n_samples,n_features] - Return value : The converted shape is the same array#### Information ​ Same as the information used in the introduction #### Code ```pythonfrom sklearn.preprocessing import StandardScalerimport pandas as pddef stand_demo(): data = pd.read_csv("dating.txt") print(data) transfer = StandardScaler() data = transfer.fit_transform(data[['milage','Liters','Consumtime']]) print(" The result of Standardization :\n",data) print(" The average value of each column of features :\n",transfer.mean_) print(" The variance of each column of features :\n",transfer.var_) return None```#### Execution results ![ Standardization Implementation Results ](https://gitee.com/xp-thebest/blog_img/raw/master/img/image-20210123224804238.p

版权声明
本文为[itread01]所创,转载请带上原文链接,感谢
https://chowdera.com/2021/01/20210124021047860X.html

随机推荐