Day 07-进阶（01）-灵析社区

本节重点

归一化
随机数组生成
矩阵乘法
分桶
寻找常见值
相关性

`1 对一个5x5的随机矩阵做归一化`

提示: (x - min) / (max - min)
答案：

import numpy as np
Z = np.random.randint(1,20, size = 25).reshape(5,5)
print(Z)

Zmax, Zmin = Z.max(), Z.min()
Z = (Z - Zmin)/(Zmax - Zmin)
print(Z)

2. 如何创建包含5到10之间随机浮动的二维数组？

描述：创建一个形状为5x3的二维数组，以包含5到10之间的随机十进制数。

思路：

1、randint 整数

2、uniform 小数 :https://www.runoob.com/python/func-number-uniform.html

#Solution Method 1:
rand_arr = np.random.randint(low=5, high=10, size=(5,3)) + np.random.random((5,3))
print(rand_arr)

#Solution Method 2:
rand_arr = np.random.uniform(5,10, size=(5,3))
print(rand_arr)

# > [[ 8.50061025 9.10531502 6.85867783]

# > [ 9.76262069 9.87717411 7.13466701]

# > [ 7.48966403 8.33409158 6.16808631]

# > [ 7.75010551 9.94535696 5.27373226]

# > [ 8.0850361 5.56165518 7.31244004]]

`3. 一个5x3的矩阵与一个3x2的矩阵相乘，实矩阵乘积是什么？`

提示: np.dot
区别：np.multiply，np.dot
答案：

Z = np.dot(np.ones((5,3)), np.ones((3,2)))
print(Z)

4. 如何将数字转换为分类（文本）数组？

问题：将iris_2d的花瓣长度（第3列）加入以形成文本数组，这样如果花瓣长度为：

Less than 3 --> 'small'
3-5 --> 'medium'
'>=5 --> 'large'
给定：

#Input
url = '<https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data>'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')
'''

答案：

import numpy as np
# Input
url = '<https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data>'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

#查看数据
iris.shape

# Bin petallength
petal_length_bin = np.digitize(iris[:, 2].astype('float'), [0, 3, 5, 10])

# Map it to respective category
label_map = {1: 'small', 2: 'medium', 3: 'large', 4: np.nan}
petal_length_cat = [label_map[x] for x in petal_length_bin]

# View
petal_length_cat[:4]
# > ['small', 'small', 'small', 'small']
petal_length_cat

5. 如何在numpy数组中找到最常见的值？

- 描述：在鸢尾属植物数据集中找到最常见的花瓣长度值（第3列）。

**给定：**答案：

# **给定：**
url = '<https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data>'
iris = np.genfromtxt(url, delimiter=',', dtype='object')

# Solution:
vals, counts = np.unique(iris[:, 2], return_counts=True)
print(vals[np.argmax(counts)])
vals
# > b'1.5'

6. 如何找到numpy数组的两列之间的相关性？

- 描述：在iris_2d中找出SepalLength（第1列）和PetalLength（第3列）之间的相关性

**给定：**答案：

# Input
url = '<https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data>'
iris = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])

# Solution 1
#np.corrcoef : <https://blog.csdn.net/qq_39514033/article/details/88931639>
cor = np.corrcoef(iris[:, 0], iris[:, 2])
print(cor)
print(cor[0, 1])

# Solution 2
#输出：r： 相关系数 [-1，1]之间，p-value: p值。
#     注： p值越小，表示相关系数越显著，一般p值在500个样本以上时有较高的可靠性。
#说明：<https://www.osgeo.cn/scipy/reference/generated/scipy.stats.pearsonr.html>
from scipy.stats.stats import pearsonr
corr, p_value = pearsonr(iris[:, 0], iris[:, 2])
print(corr,p_value)

阅读量：2026

点赞量：0

收藏量:0