# 前置机器学习（三）：30分钟掌握常用NumPy用法

2020-12-06 07:40:10

NumPy支持大量的维度数组与矩阵运算，是针对数组运算的Python库。

## 一、Python基础

### 1. List[列表]

``list = [1, 2, 3, 4, 5, 6]``

``list[1]``

``a[2:]``

``list[0] = 9``

### 2. Tuple(元组)

``tuple = ('a', 'a, 'c', 1, 2, 3.0)``

``tuple[-1]``

``tuple[2] = 'caiyongji'``

### 3. Set{集合}

``````set1 = {'a','b','c','a'}
set2 = {'b','c','d','e'}``````

set1的输出结果为：`{'a', 'b', 'c'}`注意：集合会删除重复元素。
set2的输出结果为：`{'b', 'c', 'd', 'e'}`

``set1[0]``

set1和set2的差集

``````set1 - set2
#set1.difference(set2) ``````

set1和set2的并集

``````set1 | set2
#set1.union(set2) ``````

set1和set2的交集

``````set1 & set2
#set1.intersection(set2) ``````

set1和set2的对称差集

``````set1 ^ set2
#(set1 - set2) | (set2 - set1)
#set1.symmetric_difference(set2)``````

### 4. Dictionary{字典:Dictionary}

``dict = {'gongzhonghao':'caiyongji','website':'caiyongji.com', 'website':'blog.caiyongji.com'}``

``dict['gongzhonghao']``

``dict.get('gongzhonghao')``

``dict.keys()``

``dict.values()``

``````dict['website'] = 'caiyongji.com'
dict``````

## 二、Numpy常见用法

### 1. 创建数组

``````import numpy as np
arr = np.array([1, 2, 3, 4, 5])``````

arr的输出为`array([1, 2, 3, 4, 5])`

``````my_matrix = [[1,2,3],[4,5,6],[7,8,9]]
mtrx= np.array(my_matrix)``````

mtrx的输出如下：

``````array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])``````

### 2. 索引与切片

``print('arr[0]=',arr[0],'mtrx[1,1]=',mtrx[1,1])``

``arr[:3]``

``arr[-3:-1]``

``arr[1:4:2]``

``mtrx[0:2, 0:2]``

``````array([[1, 2],
[4, 5]])``````

### 3. dtype

NumPy的dtpe有如下几种数据类型：

• i - integer
• b - boolean
• u - unsigned integer
• f - float
• c - complex float
• m - timedelta
• M - datetime
• O - object
• S - string
• U - unicode string
• V - fixed chunk of memory for other type ( void )
``````import numpy as np
arr1 = np.array([1, 2, 3, 4])
arr2 = np.array(['apple', 'banana', 'cherry'])
print('arr1.dtype=',arr1.dtype,'arr2.dtype=',arr2.dtype)``````

``arr = np.array(['1', '2', '3'], dtype='f')``

### 4. 一般方法

#### 4.1 arange

`np.arange(0,101,2)`输出结果如下，该命令表示，在[0,101)区间内均匀地生成数据，间隔步长为2。

``````array([  0,   2,   4,   6,   8,  10,  12,  14,  16,  18,  20,  22,  24,
26,  28,  30,  32,  34,  36,  38,  40,  42,  44,  46,  48,  50,
52,  54,  56,  58,  60,  62,  64,  66,  68,  70,  72,  74,  76,
78,  80,  82,  84,  86,  88,  90,  92,  94,  96,  98, 100])``````

#### 4.2 zeros

`np.zeros((2,5))`输出结果如下，该命令表示，输出2行5列全为0的矩阵（二维数组）。

``````array([[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]])``````

#### 4.3 ones

`np.ones((4,4))`输出结果如下，该命令表示，输出4行4列全为1的矩阵。

``````array([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])``````

#### 4.4 eye

`np.eye(5)`输出结果如下，该命令表示，输出对角线为1其余全为0的5行5列方阵。方阵为行列相同的矩阵。

``````array([[1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.]])``````

#### 4.5 rand

`np.random.rand(5,2)` 命令生成5行2列的随机数。

``````array([[0.67227856, 0.4880784 ],
[0.82549517, 0.03144639],
[0.80804996, 0.56561742],
[0.2976225 , 0.04669572],
[0.9906274 , 0.00682573]])``````

``````np.random.seed(99)
np.random.rand(5,2)``````

#### 4.6 randint

`np.random.randint(0,101,(4,5))`输出结果如下，该命令表示，在[0,101)区间内随机选取整数生成4行5列的数组。

``````array([[ 1, 35, 57, 40, 73],
[82, 68, 69, 52,  1],
[23, 35, 55, 65, 48],
[93, 59, 87,  2, 64]])``````

#### 4.7 max min argmax argmin

``````np.random.seed(99)
ranarr = np.random.randint(0,101,10)
ranarr``````

``array([ 1, 35, 57, 40, 73, 82, 68, 69, 52,  1])``

``print('ranarr.max()=',ranarr.max(),'ranarr.min()=',ranarr.min())``

``print('ranarr.argmax()=',ranarr.argmax(),'ranarr.argmin()=',ranarr.argmin())``

## 三、NumPy进阶用法

### 1. reshape

``````arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(4, 3)``````

``print('arr.shape=',arr.shape,'newarr.shape=',newarr.shape)``

`newarr`的输出结果如下：

``````array([[ 1,  2,  3],
[ 4,  5,  6],
[ 7,  8,  9],
[10, 11, 12]])``````

### 2. 合并与分割

#### 2.1 concatenate

``````arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))
arr``````

``````arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
arr = np.concatenate((arr1, arr2))
arr``````

``````
array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])``````

``````arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
arr = np.concatenate((arr1, arr2), axis=1)
arr``````

``````array([[1, 2, 5, 6],
[3, 4, 7, 8]])``````

#### 2.2 array_split

``````arr = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12]])
newarr = np.array_split(arr, 3)
newarr``````

newarr的值为：

``````[array([[1, 2],
[3, 4]]),
array([[5, 6],
[7, 8]]),
array([[ 9, 10],
[11, 12]])]``````

### 3. 搜索与筛选

#### 3.1 搜索

NumPy可通过`where`方法查找满足条件的数组索引。

``````arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
x = np.where(arr%2 == 0)
x``````

``(array([1, 3, 5, 7], dtype=int64),)``

#### 3.2 筛选

``````bool_arr = arr > 4
arr[bool_arr]``````

`bool_arr`的输出为：

``array([False, False, False, False,  True,  True,  True,  True])``

``arr[arr > 4]``

### 4. 排序

`sort`方法可对ndarry数组进行排序。

``````arr = np.array(['banana', 'cherry', 'apple'])
np.sort(arr)``````

``````arr = np.array([[3, 2, 4], [5, 0, 1]])
np.sort(arr)``````

``````array([[2, 3, 4],
[0, 1, 5]])``````

### 5. 随机

#### 5.1 随机概率

``random.choice([3, 5, 7, 9], p=[0.1, 0.3, 0.6, 0.0], size=(100))``

``````array([7, 5, 7, 7, 7, 7, 5, 7, 5, 7, 7, 5, 5, 7, 7, 5, 3, 5, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 5, 3, 7, 5, 7, 5, 7, 3, 7, 7, 3, 7, 7, 7, 7, 3,
5, 7, 7, 5, 7, 7, 5, 3, 5, 7, 7, 5, 5, 5, 5, 5, 7, 7, 7, 7, 7, 5,
7, 7, 7, 7, 7, 5, 7, 7, 7, 7, 3, 7, 7, 5, 7, 5, 7, 5, 7, 7, 5, 7,
7, 7, 7, 7, 7, 3, 5, 5, 7, 5, 7, 5])``````

#### 5.2 随机排列

##### 5.2.1 permutation

``````np.random.seed(99)
arr = np.array([1, 2, 3, 4, 5])
new_arr = np.random.permutation(arr)
new_arr``````

##### 5.2.2 shuffle

``````np.random.seed(99)
arr = np.array([1, 2, 3, 4, 5])
np.random.shuffle(arr)
arr``````

#### 5.3 随机分布

##### 5.3.1 正太分布

``````x = np.random.normal(loc=1, scale=2, size=(2, 3))
x``````

``````array([[ 0.14998973,  3.22564777,  1.48094109],
[ 2.252752  , -1.64038195,  2.8590667 ]])``````

pip install -i https://pypi.tuna.tsinghua.ed... seaborn
``````import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(x, hist=False)
plt.show()``````

##### 5.3.2 二项分布

``````x = np.random.binomial(n=10, p=0.5, size=10)
x``````

``````import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(x, hist=True, kde=False)
plt.show()``````

##### 5.3.3 多项式分布

``````x = np.random.multinomial(n=6, pvals=[1/6, 1/6, 1/6, 1/6, 1/6, 1/6])
x``````

##### 5.3.4 其他

https://segmentfault.com/a/1190000038391443