当前位置:网站首页>Normalization and standardization of feature preprocessing
Normalization and standardization of feature preprocessing
2021-01-24 02:13:37 【itread01】
## The main content of the previous blog - Apply MinMaxScaler Realize the normalization of feature data - Apply StandardScaler Realize the standardization of feature data ## Feature preprocessing ### Define Through ** Some conversion functions ** Put the feature data ** Convert to a more algorithmic model ** The process of characteristic data of ### Feature preprocessing API```pythonsklearn.preprocessing```### Why normalize / Standardization ? Characteristic ** The unit or size varies greatly , Or the variance of a feature is several orders of magnitude larger than that of other features **,** Easy to influence ( control ) Target results **, Some algorithms cannot learn other features ### Normalization #### Define Through the transformation of the original data, the data are mapped to ( The default is [0,1]) Between > Act on each column ,max Is the maximum value of a column ,min Is the minimum value of a column , So X’’ For the end result ,mx,mi Preset for the specified interval value respectively mx For 1,mi For 0#### API- sklearn.preprocessing.MinMaxScaler (feature_range=(0,1)… ) - MinMaxScalar.fit_transform(X) - X:numpy array Format information [n_samples,n_features] - Return value : The converted shape is the same array#### Information ```pythonmilage,Liters,Consumtime,target40920,8.326976,0.953952,314488,7.153469,1.673904,226052,1.441871,0.805124,175136,13.147394,0.428964,138344,1.669788,0.134296,1```#### Code ```pythonfrom sklearn.preprocessing import MinMaxScalerimport pandas as pddef minmax_demo(): data = pd.read_csv("dating.txt") print(data) # 1、 Instantiate a converter class transfer = MinMaxScaler(feature_range=(2, 3)) # 2、 call fit_transform data = transfer.fit_transform(data[['milage','Liters','Consumtime']]) print(" The result of normalization of minimum and maximum values :\n", data) return None```#### It turns out ### Standardization #### Define Through the transformation of the original data, the data is transformed to the mean value of 0, The standard deviation is 1 In scope > Act on each column ,mean Is the average value ,σ Is the standard deviation #### API- sklearn.preprocessing.StandardScaler( ) - After processing, all the data in each column are gathered in the mean value 0 The standard deviation is 1 - StandardScaler.fit_transform(X) - X:numpy array Format information [n_samples,n_features] - Return value : The converted shape is the same array#### Information Same as the information used in the introduction #### Code ```pythonfrom sklearn.preprocessing import StandardScalerimport pandas as pddef stand_demo(): data = pd.read_csv("dating.txt") print(data) transfer = StandardScaler() data = transfer.fit_transform(data[['milage','Liters','Consumtime']]) print(" The result of Standardization :\n",data) print(" The average value of each column of features :\n",transfer.mean_) print(" The variance of each column of features :\n",transfer.var_) return None```#### Execution results 
猜你喜欢
-
【Soul源码阅读-06】数据同步之websocket
-
How do programmers write a qualified resume? (attach resume template)
-
Websocket for data synchronization
-
【Soul源码阅读-09】数据同步之nacos
-
Nacos for data synchronization
-
一種獲取context中keys和values的高效方法 | golang
-
如何在 Blazor WebAssembly中 使用 功能開關
-
An efficient method to get keys and values in context
-
深入理解原子操作的本質
-
How to use function switch in blazor webassembly
随机推荐
- 日常分享:關於時間複雜度和空間複雜度的一些優化心得分享(C#)
- Podinfo,迷你的 Go 微服務模板
- Deep understanding of the nature of atomic operations
- Daily sharing: some optimization experience sharing about time complexity and space complexity (c)
- Podinfo, mini go microservice template
- 聊聊cortex的tenant
- Talking about tenant of cortex
- 傲视Kubernetes(五):注解和命名空间
- Kubernetes (V): annotation and namespace
- maxwell电机转矩扫描与使用MTPA策略绘制效率map图
- Maxwell motor torque scanning and drawing efficiency map using MTPA strategy
- QT串口助手(三):数据接收
- QT serial assistant (3): data receiving
- QT串口助手(三):数据接收
- QT serial assistant (3): data receiving
- 技术基础 | Apache Cassandra 4.0基准测试
- 技术基础 | Apache Cassandra 4.0基准测试
- Technical foundation Apache Cassandra 4.0 benchmark
- Technical foundation Apache Cassandra 4.0 benchmark
- 草稿
- draft
- 关于数据库中主键自增长问题:Error creating bean with name 'entityManagerFactory' defined in class path
- Error creating bean with name 'entitymanagerfactory' defined in class path
- 面试官角度看应聘:问题到底出在哪?(下)
- 自动化 Web 性能优化分析方案
- 面试官角度看应聘:问题到底出在哪?(上)
- Interviewer's perspective on Application: what's the problem? (2)
- Automated web performance optimization analysis scheme
- Interviewer's perspective on Application: what's the problem? (1)
- [CPP] STL 簡介
- 技術基礎 | Apache Cassandra 4.0基準測試
- QT串列埠助手(三):資料接收
- Introduction to [CPP] STL
- 程式設計師如何寫一份合格的簡歷?(附簡歷模版)
- Lakehouse: 統一資料倉庫和高階分析的新一代開放平臺
- 特徵預處理之歸一化&標準化
- 在.NET Core 中使用Quartz.NET
- Technical foundation Apache Cassandra 4.0 benchmark
- QT serial port assistant (3): data receiving
- 對“微信十年產品思考”的思考