Day 07-进阶(01)-灵析社区

野生程序员在线

本节重点

  • 归一化
  • 随机数组生成
  • 矩阵乘法
  • 分桶
  • 寻找常见值
  • 相关性

1 对一个5x5的随机矩阵做归一化

  • 提示: (x - min) / (max - min)
  • 答案:
import numpy as np
Z = np.random.randint(1,20, size = 25).reshape(5,5)
print(Z)

Zmax, Zmin = Z.max(), Z.min()
Z = (Z - Zmin)/(Zmax - Zmin)
print(Z)

2. 如何创建包含5到10之间随机浮动的二维数组?

#Solution Method 1:
rand_arr = np.random.randint(low=5, high=10, size=(5,3)) + np.random.random((5,3))
print(rand_arr)

#Solution Method 2:
rand_arr = np.random.uniform(5,10, size=(5,3))
print(rand_arr)

# > [[ 8.50061025 9.10531502 6.85867783]

# > [ 9.76262069 9.87717411 7.13466701]

# > [ 7.48966403 8.33409158 6.16808631]

# > [ 7.75010551 9.94535696 5.27373226]

# > [ 8.0850361 5.56165518 7.31244004]]

3. 一个5x3的矩阵与一个3x2的矩阵相乘,实矩阵乘积是什么?

  • 提示: np.dot
  • 区别:np.multiply,np.dot
  • 答案:
Z = np.dot(np.ones((5,3)), np.ones((3,2)))
print(Z)

4. 如何将数字转换为分类(文本)数组?

  • 问题:将iris_2d的花瓣长度(第3列)加入以形成文本数组,这样如果花瓣长度为:
    • Less than 3 --> 'small'
    • 3-5 --> 'medium'
    • '>=5 --> 'large'
    • 给定:
#Input
url = '<https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data>'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')
'''

答案:

import numpy as np
# Input
url = '<https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data>'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

#查看数据
iris.shape

# Bin petallength
petal_length_bin = np.digitize(iris[:, 2].astype('float'), [0, 3, 5, 10])

# Map it to respective category
label_map = {1: 'small', 2: 'medium', 3: 'large', 4: np.nan}
petal_length_cat = [label_map[x] for x in petal_length_bin]

# View
petal_length_cat[:4]
# > ['small', 'small', 'small', 'small']
petal_length_cat

5. 如何在numpy数组中找到最常见的值?


- 描述:在鸢尾属植物数据集中找到最常见的花瓣长度值(第3列)。

**给定:**答案:

# **给定:**
url = '<https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data>'
iris = np.genfromtxt(url, delimiter=',', dtype='object')

# Solution:
vals, counts = np.unique(iris[:, 2], return_counts=True)
print(vals[np.argmax(counts)])
vals
# > b'1.5'

6. 如何找到numpy数组的两列之间的相关性?


- 描述:在iris_2d中找出SepalLength(第1列)和PetalLength(第3列)之间的相关性

**给定:**答案:

# Input
url = '<https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data>'
iris = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])

# Solution 1
#np.corrcoef : <https://blog.csdn.net/qq_39514033/article/details/88931639>
cor = np.corrcoef(iris[:, 0], iris[:, 2])
print(cor)
print(cor[0, 1])

# Solution 2
#输出:r: 相关系数 [-1,1]之间,p-value: p值。
#     注: p值越小,表示相关系数越显著,一般p值在500个样本以上时有较高的可靠性。
#说明:<https://www.osgeo.cn/scipy/reference/generated/scipy.stats.pearsonr.html>
from scipy.stats.stats import pearsonr
corr, p_value = pearsonr(iris[:, 0], iris[:, 2])
print(corr,p_value)


阅读量:2038

点赞量:0

收藏量:0