推荐最新

Day 09-实战：线性回归

本节重点：• 基于numpy实现线性回归算法• 参考视频：https://www.bilibili.com/video/BV1oD4y1u7U8/?spm_id_from=333.999.0.0&vd_source=ea41ada76e180cf6c5af0f913147e4c0一、线性回归：表达式:损失函数：梯度更新：lr为学习率，w和b的梯度就是对loss function的求导【这里为手动求解导数后，放入公式，也用diff进行计算】二、numpy实现线性回归数据准备import numpy as np import matplotlib.pyplot as plt x = np.random.randint(2,200, size = 100) y = np.linspace(0.4, 0.500, num=100)* x + np.random.randint(1,3,size=100) plt.plot(x,y,'o') plt.show()predictdef count_y_prediction(X, w, b): y_pred = np.add(np.multiply(w,X) , b) # y_pred = np.add(np.dot(w,x) ,b) # print(y_pred) return y_pred损失函数def compete_error_for_given_points(y, y_pred): error = np.power(np.subtract(y, y_pred) , 2) error = np.sum(error) / y.shape[0] # print(error) return error梯度更新def compete_gradient_and_update(x, w, b, lr): w_gradient = 0 b_gradient = 0 N = x.shape[0] w_gradient = 2*np.multiply(np.subtract(np.add(np.multiply(self.w,x),b),y),x) w_gradient_sum = np.sum(w_gradient) b_gradient = 2*np.subtract(np.add(np.multiply(self.w,x),b),y) b_gradient_sum = np.sum(b_gradient) w -= lr * w_gradient_sum / N b -= lr * b_gradient_sum / N return w,b画图def draw(X, y, y_pred,final=True): # plt.ion() plt.clf() plt.scatter(X, y, c="red") plt.plot(X, y_pred, c="blue") if final: plt.pause(0.2) # plt.close() else: plt.show()三、代码封装#简单线性回归 class SimpleRegress(object): def __init__(self): self.b = np.linspace(1, 3, num=100) self.w = np.ones((100,)) self.lr = 0.003 return #梯度更新 def compete_gradient_and_update(self,x,y): w_gradient = 0 b_gradient = 0 N = x.shape[0] w_gradient = 2*np.multiply(np.subtract(np.add(np.multiply(self.w,x),self.b),y),x) w_gradient_sum = np.sum(w_gradient) b_gradient = 2*np.subtract(np.add(np.multiply(self.w,x),self.b),y) b_gradient_sum = np.sum(b_gradient) self.w -= self.lr * w_gradient_sum / N self.b -= self.lr * b_gradient_sum / N return self.w,self.b #求损失函数 def compete_error_for_given_points(self,y, y_pred): error = np.power(np.subtract(y , y_pred) , 2) error = np.sum(error)/ error.shape[0] # print(error) return error #计算多少个epoch def gradient_desent_runner(self,times,x,y): for i in range(times): self.w,self.b = self.compete_gradient_and_update(x,y) return [self.w,self.b] #推理 def count_y_prediction(self,X): y_pred = np.add(np.multiply(self.w,X) ,self.b) # print(y_pred) return y_pred #画图 def draw(self,X, y, y_pred,final=True): # plt.ion() plt.clf() plt.plot(X, y, 'o') plt.plot(X, y_pred, c="blue") if final: plt.pause(0.2) # plt.close() else: plt.show()#数据 x_data = np.random.randint(2,200, size = 100) y_data = np.linspace(0.4, 0.450, num=100)* x_data times = 50 # epoch次数 test_data = list([16]) #测试数据 sr = SimpleRegress() # 预测 y_pred = sr.count_y_prediction(x_data) # 记录原始值 print("Starting Giadient desent at w ={0},b ={1},error={2}" .format(sr.w,sr.b,sr.compete_error_for_given_points(y_data,y_pred))) print("Running:") # 梯度更新求解 [w,b] = sr.gradient_desent_runner(times,x_data,y_data) # 最新预测值 y_pred = sr.count_y_prediction(x_data) print("After {0} times w = {1},b = {2},error = {3}" .format(times,w,b,sr.compete_error_for_given_points(y_data,y_pred)))> Starting Giadient desent at w =[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.],b =[1. 1.02020202 1.04040404 1.06060606 1.08080808 1.1010101 1.12121212 1.14141414 1.16161616 1.18181818 1.2020202 1.22222222 1.24242424 1.26262626 1.28282828 1.3030303 1.32323232 1.34343434 1.36363636 1.38383838 1.4040404 1.42424242 1.44444444 1.46464646 1.48484848 1.50505051 1.52525253 1.54545455 1.56565657 1.58585859 1.60606061 1.62626263 1.64646465 1.66666667 1.68686869 1.70707071 1.72727273 1.74747475 1.76767677 1.78787879 1.80808081 1.82828283 1.84848485 1.86868687 1.88888889 1.90909091 1.92929293 1.94949495 1.96969697 1.98989899 2.01010101 2.03030303 2.05050505 2.07070707 2.09090909 2.11111111 2.13131313 2.15151515 2.17171717 2.19191919 2.21212121 2.23232323 2.25252525 2.27272727 2.29292929 2.31313131 2.33333333 2.35353535 2.37373737 2.39393939 2.41414141 2.43434343 2.45454545 2.47474747 2.49494949 2.51515152 2.53535354 2.55555556 2.57575758 2.5959596 2.61616162 2.63636364 2.65656566 2.67676768 2.6969697 2.71717172 2.73737374 2.75757576 2.77777778 2.7979798 2.81818182 2.83838384 2.85858586 2.87878788 2.8989899 2.91919192 2.93939394 2.95959596 2.97979798 3. ],error=4217.362019403632 Running: After 50 times w = [3.37361812e+92 3.37361812e+92 3.37361812e+92 3.37361812e+92 3.37361812e+92 3.37361812e+92 3.37361812e+92 3.37361812e+92 3.37361812e+92 3.37361812e+92 3.37361812e+92 3.37361812e+92 ... 2.6302579e+90 2.6302579e+90 2.6302579e+90 2.6302579e+90 2.6302579e+90 2.6302579e+90 2.6302579e+90 2.6302579e+90 2.6302579e+90 2.6302579e+90 2.6302579e+90 2.6302579e+90 2.6302579e+90 2.6302579e+90 2.6302579e+90 2.6302579e+90 2.6302579e+90 2.6302579e+90 2.6302579e+90 2.6302579e+90],error = 1.3779443376053316e+189

浏览量2103

野生程序员在线

Day 01-基础入门

本节内容1、numpy简介2、ndarray介绍3、numpy安装4、numpy基础用法5、numpy数组广播一、什么是 NumPy?NumPy是一个功能强大的Python库，主要用于对多维数组执行计算。NumPy这个词来源于两个单词-- Numerical和Python。NumPy提供了大量的库函数和操作，可以帮助程序员轻松地进行数值计算。这类数值计算广泛用于以下任务：机器学习模型：在编写机器学习算法时，需要对矩阵进行各种数值计算。例如矩阵乘法、换位、加法等。NumPy提供了一个非常好的库，用于简单(在编写代码方面)和快速(在速度方面)计算。NumPy数组用于存储训练数据和机器学习模型的参数。图像处理和计算机图形学：计算机中的图像表示为多维数字数组。NumPy成为同样情况下最自然的选择。实际上，NumPy提供了一些优秀的库函数来快速处理图像。例如，镜像图像、按特定角度旋转图像等。数学任务：NumPy对于执行各种数学任务非常有用，如数值积分、微分、内插、外推等。因此，当涉及到数学任务时，它形成了一种基于Python的MATLAB的快速替代。二、Ndarray 对象从ndarray对象提取的任何元素（通过切片）由一个数组标量类型的 Python 对象表示。下图显示了ndarray，数据类型对象（dtype）和数组标量类型之间的关系。它从任何暴露数组接口的对象，或从返回数组的任何方法创建一个ndarray。numpy.array(object, dtype = None, copy = True, order = None, subok = False, ndmin = 0)三、安装numpy安装numpy!pip install numpy -i https://pypi.tuna.tsinghua.edu.cn/simple安装jupyter!pip install jupyter notebook -i https://pypi.tuna.tsinghua.edu.cn/simple四、基础用法(重要)4.1、为什么要用numpy比如：我要针对一个数组的元素都乘以2，有2种做法# 方法一 num_list=[1,2,3,4,5,6] ret_list = [] for num in num_list: ret_list.append(num*2) print(ret_list) print("-----------------") # 方案二 np_array = np.array(num_list) print(np_array*2)4.2、各维度数据定义：# 创建ndarray数据 import numpy as np # 一维数组 a = np.array([1,2,3]) a_ = np.array((1,2,3)) # 二维数组 b = np.array([[1, 2], [3, 4]]) # 三维数组 c = np.array([[[1, 2], [3, 4]], [[1, 2], [3, 4]]]) print(a) print("-----------------") print(a_) print("-----------------") print(b) print("-----------------") print(c)4.3、数组信息print(a.shape) print("-----------------") print(b.shape) print("-----------------") print(c.shape)五、数据类型5.1、numpy数据类型NumPy 支持比 Python 更多种类的数值类型。下表显示了 NumPy 中定义的不同标量数据类型。NumPy 数字类型是dtype（数据类型）对象的实例，每个对象具有唯一的特征5.2、类型对象（dtype）dtype可由一下语法构造：numpy.dtype(object, align, copy)参数为：Object：被转换为数据类型的对象。Align：如果为true，则向字段添加间隔，使其类似 C 的结构体。Copy ? 生成dtype对象的新副本，如果为flase，结果是内建数据类型对象的引用。# 使用数组标量类型 # #int8，int16，int32，int64 可替换为等价的字符串 'i1'，'i2'，'i4'，以及其他。 import numpy as np dt = np.dtype(np.int32) dt_ = np.dtype('i4') print(dt) print("----------------") print(dt_)# 创建结构化数据类型 dt = np.dtype([('age',np.int8)]) a = np.array([(10,),(20,),(30,)], dtype = dt) print(dt) print(a['age'])5.4、常量（只需有印象即可）常用的常量如下：# 无穷大 # inf print(np.inf) # 它是一个浮点型 print(type(np.inf)) # float a = np.array([np.inf, -np.inf, 1]) # 显示哪些元素是正无穷大或负无穷大 np.isinf(a) # array([ True, True, False]) #自然数 e print(type(np.e)) # float print(np.e) # 2.718281828459045 #圆周率 print(type(np.pi)) # float print(np.pi) # 3.1415926535897935.5、广播(重要)广播（Array Broadcasting）：描述的是 NumPy 如何计算不同形状的数组之间的运算。如果是较大的矩阵和较小的矩阵进行运算的话，较小的矩阵就会被广播，从而保证运算的正确进行。如果满足以下规则，可以进行广播：ndim较小的数组会在前面追加一个长度为 1 的维度。输出数组的每个维度的大小是输入数组该维度大小的最大值。如果输入在每个维度中的大小与输出大小匹配，或其值正好为 1，则在计算中可它。如果输入的某个维度大小为 1，则该维度中的第一个数据元素将用于该维度的所有计算。如果上述规则产生有效结果，并且满足以下条件之一，那么数组被称为可广播的。数组拥有相同形状。数组拥有相同的维数，每个维度拥有相同长度，或者长度为 1。数组拥有极少的维度，可以在其前面追加长度为 1 的维度，使上述条件成立。a = np.array([[0.0,0.0,0.0],[10.0,10.0,10.0],[20.0,20.0,20.0],[30.0,30.0,30.0]]) print ('a shape：') print(a.shape) b = np.array([1.0,2.0,3.0]) print ('b shape：') print(b.shape) print ('第一个数组：') print(a) print ('\n') print ('第二个数组：') print(b) print ('\n') print ('\n') print ('第一个数组加第二个数组：') print(a + b) print ('\n')图片展示了数组b如何通过广播来与数组a兼容:

浏览量2081

野生程序员在线

Day 02-ndarray

本节内容（重要）ndarray介绍基础信息类型转换更改阵列形状 reshape&resize转置 transpose & array.T展平 flatten展平为连续数组 ravel删除数组单维条目 squeeze数组堆叠(vstack&hstack)一、NdarrayNumPy 的 ndarray 是一个（通常是固定大小）由相同类型和大小的数据组成的多维容器。数组中的维数和项数由其形状定义，形状是由 N 个非负整数组成的元组，指定每个维数的大小。数组中的项类型由单独的数据类型对象（dtype）指定，与每个 ndarray 关联。NumPy 的数组类称为 ndarray，简称为 array。请注意，numpy.array 与标准 Python 库类 array.array 不同，后者仅处理一维数组且功能较少二、基础信息与 Python 中的其他容器对象一样，可以通过对数组进行索引或切片（例如，使用 N 个整数）并通过 ndarray 的方法和属性来访问和修改 ndarray 的内容。import numpy as np # 创建15个元素的数组，形状修改成 3 行 5 列 a = np.arange(15).reshape(3, 5) print(a) ''' array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]]) ''' print(a.shape) # 形状 # (3,5) print(a.ndim) # 维数 # 2 print(a.dtype.name) # 类型名 # 'int32' print(a.itemsize) # 一个数组元素的字节总长度（和类型也相关）;数组中每个元素以字节为单位的大小，例如，float64 类型的元素数组的项目大小为8（= 64/8），而complex32 类型的元素数组的项目大小为 4（= 32/8）。 # 4 print(a.size) # 大小，元素数 print(a.dtype) #数据类型 # dtype('int32') # 15 print(type(a)) # <class 'numpy.ndarray'>三、类型转换ndarray.astype 是最为常用的类型转换方法，它作用于 ndarray ，可以将原数据转换为我们想要的类型，当然数据特征需要满足目标类型的要求语法如下：# astype 语法 ndarray.astype(dtype, order='K', casting='unsafe', subok=True, copy=True)其中参数有：dtype: str or dtype（dtype 对象）将数组强制转换为的类型代码或数据类型order: {‘C’, ‘F’, ‘A’, ‘K’}, optional控制结果的内存布局顺序。“C”表示 C 顺序“F” 表示 Fortran 顺序“A”表示“F”顺序（如果所有数组都是Fortran连续的），否则是 “C” 顺序“K”表示尽可能接近数组元素在内存中出现的顺序。默认值为“K”。casting: {‘no’, ‘equiv’, ‘safe’, ‘same_kind’, ‘unsafe’}, optional控制可能发生的数据转换类型,，默认为 “unsafe” 以实现向后兼容性。‘no’ 数据类型根本不应该被强制转换‘equiv’ 只允许更改字节顺序‘safe’ 只允许保留值的强制转换。‘same_kind’ 只允许安全强制转换或类内强制转换，如 float64 到 float32‘unsafe’ 可以进行任何数据转换subok: bool, optional如果为True，则传递子类（默认），否则返回的数组将强制为基类数组copy: bool, optional默认情况下，astype 总是返回新分配的数组，如果设置为 false，并且满足了 dtype、order 和 subok 要求，则返回输入数组而不是副本x = np.array([1, 2, 3.4]) print(x) # array([1. , 2. , 3.4]) print(x.astype(int)) b = x.astype(int) # array([1, 2, 3]) print(x.dtype) # dtype('float64') print(b.dtype) # dtype('int32') print("---------------") # 理解部分内容转换 a = np.array([[1,2,3], [4,5,6]]) print(a[:,1]) # array([2, 5]) print(a[:,1].astype('str')) # 转换后生成此副本 # array(['2', '5'], dtype='<U21') print(a.dtype) # 原来的数组没有改变 # dtype('int64') print("---------------") # 构造一个时间表达数据 arr = [2020, 12, 0.6552562894775783] custom_type = np.dtype([ ('YEAR',np.uint16), ('DOY', np.uint16), ('REF',np.float16) ]) d = np.array([tuple(arr)], custom_type) print(d) ''' array([(2020, 12, 0.6553)], dtype=[('YEAR', '<u2'), ('DOY', '<u2'), ('REF', '<f2')]) ''' print(d['YEAR']) # array([2020], dtype=uint16)四、NumPy 更改阵列形状NumPy 常用的数据变形操作有：4.1、数组变形 np.reshapeNumPy 提供了 numpy.reshape 和 ndarray.reshape 两个层面的变形方法，它的实现的效果是一样的，返回包含具有新形状的相同数据的数组。新形状应与原形状兼容#ndarray.reshape a = np.array([[1,2,3], [4,5,6]]) print(a) print('---------------') #ndarray reshape # 以下方法只管指定行列数，其他位置用-1，会自动计算 print(a.reshape([-1, 3])) # 要3列（常用） print('---------------') print(a.reshape([2, -1])) # 要两行，效果相同（常用） print('---------------') print(a.reshape(6)) # 一维，6列 print('---------------') print(a.reshape(-1)) # 同上 # array([1, 2, 3, 4, 5, 6]) print('---------------') print(a.reshape([3,2])) # 转为3行两列 print('---------------') print(a.reshape([2, 4])) # 报错，形状无法兼容 # Traceback (most recent call last) ----> 1 a.reshape([2, 4]) # ValueError: cannot reshape array of size 6 into shape (2,4)# numpy reshape a = np.arange(6).reshape((3, 2)) print(a) print('---------------') b = np.reshape(a, (2, 3)) # C-like 索引顺序 print(b) print('---------------') # np.ravel(a)-->np.reshape(-1) c = np.reshape(np.ravel(a), (2, 3)) # 相当于 C ravel 然后 C 重塑 d = np.reshape(np.reshape(a,(-1)), (2, 3)) # 相当于 C ravel 然后 C 重塑 print(c) print(d)4.2、更改形状 np.resize1、numpy.resize(a, new_shape) 和 ndarray.resize(new_shape, refcheck=True) 等同的作用，numpy.resize 如果新数组比原始数组大，则新数组中会填充 a 的重复副本。请注意，此行为与 ndarray.resize 不同，后者填充 0 而不是 a 的重复副本2、resize 将在必要时为数据区域重新分配空间，当数组的总大小不变时，应使用 reshapea=np.array([[0,1], [2,3]]) print(np.resize(a,(2,3))) print('---------------') print(np.resize(a,(1,4))) print('---------------') # array([[0, 1, 2, 3]]) print(np.resize(a,(2,4)))缩小数组：数组被展平、调整大小和形状#nadaryy reize a = np.array([[0, 1], [2, 3]], order='C') a.resize((2, 1)) print(a) ''' array([[0], [1]]) ''' a2 = np.array([[0, 1], [2, 3]], order='F') a2.resize((2, 1)) print(a2) ''' array([[0], [2]]) '''放大数组：如上所述，但缺少的条目用零填充：b = np.array([[0, 1], [2, 3]]) print(b.resize(2, 3)) # 新的形状参数不必是元组 print(b) ''' array([[0, 1, 2], [3, 0, 0]]) '''resize 和 reshape 的区别：resize 如果新数组比原数组大，则将会copy原数组中的值对新数组进行填充reshape 在不改变原数组数据的情况下，将它 reshape 成一个新的维度，如果给定的数组数据和需要reshape的形状不符合时，将会报错五、转置 transpose array.T1、ndarray.transpose(*axes) 和 ndarray.T 效果一样。2、对于一维数组，这没有任何影响，因为转置向量只是同一个向量a = np.array([[0, 1], [2, 3]]) print(a) print("========1==========") ''' [[0 1] [2 3]] ''' print(a.transpose()) print("=========2=========") ''' [[0 2] [1 3]] ''' # (2,2)->2:0维，2:1维 print(a.transpose(0,1)) print("=========3=========") ''' [[0 1] [2 3]] ''' print(a.transpose(1,0)) print("=========4=========") ''' [[0 2] [1 3]] ''' # T 操作 x = np.array([[0, 1], [2, 3]]) print(x) print("=========5=========") ''' [[0 1] [2 3]] ''' print(x.T) print("=========6=========") ''' [[0 2] [1 3]] ''' #一维向量 x = np.array([1.,2.,3.,4.]) print(x) # array([ 1., 2., 3., 4.]) print("=========7=========") print(x.T) # array([ 1., 2., 3., 4.])transpose转置说明：x[0][0] == 0x[0][1] == 1x[1][0] == 2x[1][1] == 3我们不妨设第一个方括号“[]”为 0轴，第二个方括号为 1轴，则x可在 0-1坐标系下表示如下：1、x.transpose((0,1)) 表示按照原坐标轴改变序列，也就是保持不变2、x.transpose((1,0)) 表示交换 ‘0轴’ 和 ‘1轴’，所以就得到如下图所示结果：六、展平 flattenndarray.flatten(order='C') 返回折叠为一维的数组的副本，order 参数可选 {‘C’, ‘F’, ‘A’, ‘K’}。a = np.array([[1,2], [3,4]]) a.flatten() # array([1, 2, 3, 4]) a.flatten('F') # array([1, 3, 2, 4])“C”表示按行为主（C样式）顺序展平。“F”表示按列为主（Fortran 样式）顺序展平。七、展平为连续数组 ravel1、numpy.ravel(a, order='C') 如同 ndarray.ravel([order]) 返回一个连续的扁平数组，包含输入元素的一维数组2、返回的数组将具有与输入数组相同的类型，例如，将为屏蔽数组输入返回屏蔽数组。x = np.array([[1, 2, 3], [4, 5, 6]]) # 等同于 reshape(-1, order=order) np.ravel(x) # array([1, 2, 3, 4, 5, 6]) x.reshape(-1) # array([1, 2, 3, 4, 5, 6]) np.ravel(x, order='F') # array([1, 4, 2, 5, 3, 6])ravel和flatten的区别：两者的区别在于返回拷贝（copy）还是返回视图（view），numpy.ravel() 返回的是视图，会影响原始矩阵；numpy. flatten() 返回的是拷贝，对拷贝所做的修改不会影响原始矩阵平时使用的时候flatten()更为合适.在使用过程中flatten()分配了新的内存a = np.arange(12).reshape(3,4) print(a) # [[ 0 1 2 3] # [ 4 5 6 7] # [ 8 9 10 11]] # 创建一个和a相同内容的数组b b = a.copy() c = a.ravel() d = b.flatten() # 输出c和d数组 print(c) # [ 0 1 2 3 4 5 6 7 8 9 10 11] print(d) # [ 0 1 2 3 4 5 6 7 8 9 10 11] # 可以看到c和d数组都是扁平化后的数组,具有相同的内容 print(a is c) # False print(b is d) # False # 可以看到以上a,b,c,d是四个不同的对象 # 但因为c是a的一种展示方式,虽然他们是不同的对象,但在修改c的时候,a中相应的数也改变了 c[1] = 99 d[1] = 99 print(a) #ravel # [[ 0 99 2 3] # [ 4 5 6 7] # [ 8 9 10 11]] print(b) #flatten # [[ 0 1 2 3] # [ 4 5 6 7] # [ 8 9 10 11]] print(c) # [ 0 99 2 3 4 5 6 7 8 9 10 11] print(d) # [ 0 99 2 3 4 5 6 7 8 9 10 11]八、去掉为1的维度 squeeze1、ndarray.squeeze(axis=None) 和 numpy.squeeze(a, axis=None)[source]从数组的形状中删除单维度条目,即把 shape 中为1的维度去掉2、将输入的数组删除长度为1的所有维度或维度的子集c = np.arange(10).reshape(2,5) print(c) print(np.squeeze(c)) print("-----------------") d = np.arange(10).reshape(1,2,5) print(d) print(d.shape) print("-----------------") print(np.squeeze(d)) print(np.squeeze(d).shape) print("-----------------") e = np.arange(10).reshape(2,1,5) print(e) print(e.shape) print("-----------------") print(np.squeeze(e)) print(np.squeeze(e).shape)九、数组堆叠(vstack&hstack)多个数组可以沿不同的轴堆叠在一起a = np.array([[0, 1], [2, 3]]) print(a) b = np.array([[5, 6], [7, 8]]) print(b) print("---------------") print(np.vstack((a, b))) print("---------------") print(np.hstack((a, b)))

浏览量2031

野生程序员在线

Day 10-实战：MLP

本节重点numpy实现MLP一、神经网络结构神经网络的一般结构是由输入层、隐藏层(神经元)、输出层构成的。隐藏层可以是1层或者多层叠加，层与层之间是相互连接的，如下图所示：本节简化一下，现在MLP只有三层：输入层、一层隐藏层、输出层：1、神经元神经元一般常用sigmoid函数，具有激活功能import numpy as np def sigmoid(x): return 1 / (1 + np.exp(-x)) # derivative of sigmoid # sigmoid(y) * (1.0 - sigmoid(y)) # the way we use this y is already sigmoided def dsigmoid(y): return y * (1.0 - y)2、神经网络初始化参数隐藏层参数输出层参数输入层参数class MLP_NeuralNetwork(object): def __init__(self, input, hidden, output): """ :param input: number of input neurons :param hidden: number of hidden neurons :param output: number of output neurons """ self.input = input + 1 # add 1 for bias node self.hidden = hidden self.output = output # set up array of 1s for activations self.ai = [1.0] * self.input self.ah = [1.0] * self.hidden self.ao = [1.0] * self.output # create randomized weights self.wi = np.random.randn(self.input, self.hidden) self.wo = np.random.randn(self.hidden, self.output) # create arrays of 0 for changes self.ci = np.zeros((self.input, self.hidden)) self.co = np.zeros((self.hidden, self.output))一般用矩阵做所有这些计算，因为它们速度快，而且非常容易阅读，输入层的大小(特性)、隐藏层的大小(要调优的变量参数)和输出层的数量(可能的类的数量)注意：我们将所有的权重初始化为随机数。重要的是权值是随机的，否则我们将无法调整网络。如果所有的权重是一样的，那么所有隐藏的单位都是一样的，那你的神经网络算法就废了3、前馈网络第一层计算：第一层激活函数：第二层计算（output):第二层激活函数：def feedForward(self, inputs): if len(inputs) != self.input-1: raise ValueError('Wrong number of inputs you silly goose!') # input activations for i in range(self.input -1): # -1 is to avoid the bias self.ai[i] = inputs[i] # hidden activations for j in range(self.hidden): sum = 0.0 #输入层神经单元计算结果之和 for i in range(self.input): sum += self.ai[i] * self.wi[i][j] self.ah[j] = sigmoid(sum) # output activations for k in range(self.output): sum = 0.0 #隐藏层神经单元计算结果之和 for j in range(self.hidden): sum += self.ah[j] * self.wo[j][k] self.ao[k] = sigmoid(sum) return self.ao[:]4、反向传播（重要）参考：https://www.aiexplorer.blog/article/bp.html4.1、统计误差4.2、反向传播其实如下图所示，输出层BP：隐藏层BP：def backPropagate(self, targets, N): """ :param targets: y values :param N: learning rate :return: updated weights and current error """ if len(targets) != self.output: raise ValueError('Wrong number of targets you silly goose!') # calculate error terms for output # the delta tell you which direction to change the weights # 计算输出层的梯度 output_deltas = [0.0] * self.output for k in range(self.output): error = -(targets[k] - self.ao[k]) output_deltas[k] = dsigmoid(self.ao[k]) * error # calculate error terms for hidden # delta tells you which direction to change the weights # 计算隐藏层的梯度 hidden_deltas = [0.0] * self.hidden for j in range(self.hidden): error = 0.0 for k in range(self.output): error += output_deltas[k] * self.wo[j][k] hidden_deltas[j] = dsigmoid(self.ah[j]) * error # update the weights connecting hidden to output # 更新输出层参数 for j in range(self.hidden): for k in range(self.output): change = output_deltas[k] * self.ah[j] self.wo[j][k] -= self.learning_rate * change + self.co[j][k] self.co[j][k] = change # update the weights connecting input to hidden # 更新隐藏层参数 for i in range(self.input): for j in range(self.hidden): change = hidden_deltas[j] * self.ai[i] self.wi[i][j] -= self.learning_rate * change + self.ci[i][j] self.ci[i][j] = change # calculate error loss funciton # 计算损失函数 error = 0.0 for k in range(len(targets)): error += 0.5 * (targets[k] - self.ao[k]) ** 2 return error5、训练指定epoch(iter)传入数据集feedforwardbackpropagatedef train(self, patterns, iterations = 3000, N = 0.0002): # N: learning rate for i in range(iterations): error = 0.0 for p in patterns: inputs = p[0] targets = p[1] self.feedForward(inputs) error = self.backPropagate(targets, N) if i % 500 == 0: print('error %-.5f' % error)6、推理（predict)调用feedforwarddef predict(self, X): """ return list of predictions after training algorithm """ predictions = [] for p in X: predictions.append(self.feedForward(p)) return predictions二、MLP代码import time import math import random import numpy as np class MLP_NeuralNetwork(object): def __init__(self, input, hidden, output,iterations, learning_rate, rate_decay): """ :param input: number of input neurons :param hidden: number of hidden neurons :param output: number of output neurons """ # initialize parameters self.iterations = iterations self.learning_rate = learning_rate self.rate_decay = rate_decay self.input = input + 1 # add 1 for bias node self.hidden = hidden self.output = output # set up array of 1s for activations self.ai = [1.0] * self.input self.ah = [1.0] * self.hidden self.ao = [1.0] * self.output # create randomized weights # use scheme from 'efficient backprop to initialize weights input_range = 1.0 / self.input ** (1/2) output_range = 1.0 / self.hidden ** (1/2) self.wi = np.random.normal(loc = 0, scale = input_range, size = (self.input, self.hidden)) self.wo = np.random.normal(loc = 0, scale = output_range, size = (self.hidden, self.output)) # create arrays of 0 for changes self.ci = np.zeros((self.input, self.hidden)) self.co = np.zeros((self.hidden, self.output)) #前馈网络 def feedForward(self, inputs): if len(inputs) != self.input-1: raise ValueError('Wrong number of inputs you silly goose!') # input activations for i in range(self.input -1): # -1 is to avoid the bias self.ai[i] = inputs[i] # hidden activations for j in range(self.hidden): sum = 0.0 for i in range(self.input): sum += self.ai[i] * self.wi[i][j] self.ah[j] = sigmoid(sum) # output activations for k in range(self.output): sum = 0.0 for j in range(self.hidden): sum += self.ah[j] * self.wo[j][k] self.ao[k] = sigmoid(sum) return self.ao[:] #反向传播 def backPropagate(self, targets): """ :param targets: y values :param N: learning rate :return: updated weights and current error """ if len(targets) != self.output: raise ValueError('Wrong number of targets you silly goose!') # calculate error terms for output # the delta tell you which direction to change the weights output_deltas = [0.0] * self.output for k in range(self.output): error = -(targets[k] - self.ao[k]) output_deltas[k] = dsigmoid(self.ao[k]) * error # calculate error terms for hidden # delta tells you which direction to change the weights hidden_deltas = [0.0] * self.hidden for j in range(self.hidden): error = 0.0 for k in range(self.output): error += output_deltas[k] * self.wo[j][k] hidden_deltas[j] = dsigmoid(self.ah[j]) * error # update the weights connecting hidden to output for j in range(self.hidden): for k in range(self.output): change = output_deltas[k] * self.ah[j] self.wo[j][k] -= self.learning_rate * change + self.co[j][k] self.co[j][k] = change # update the weights connecting input to hidden for i in range(self.input): for j in range(self.hidden): change = hidden_deltas[j] * self.ai[i] self.wi[i][j] -= self.learning_rate * change + self.ci[i][j] self.ci[i][j] = change # calculate error error = 0.0 for k in range(len(targets)): error += 0.5 * (targets[k] - self.ao[k]) ** 2 return error #测试 def test(self, patterns): """ Currently this will print out the targets next to the predictions. Not useful for actual ML, just for visual inspection. """ for p in patterns: print(p[1], '->', self.feedForward(p[0])) #训练 def train(self, patterns): # N: learning rate for i in range(self.iterations): error = 0.0 random.shuffle(patterns) for p in patterns: inputs = p[0] targets = p[1] self.feedForward(inputs) error += self.backPropagate(targets) if i % 10 == 0: print("iterations:%d ,lr:%-.5f ,error:%-.5f " % (i,self.learning_rate,error)) # with open('error.txt', 'a') as errorfile: # errorfile.write(str(error) + '\n') # errorfile.close() # if i % 10 == 0: # print('error %-.5f' % error) # learning rate decay self.learning_rate = self.learning_rate * (self.learning_rate / (self.learning_rate + (self.learning_rate * self.rate_decay))) #预测 def predict(self, X): """ return list of predictions after training algorithm """ predictions = [] for p in X: predictions.append(self.feedForward(p)) return predictions

浏览量2021

野生程序员在线

Day 05-结构化数组

本节内容：1、结构化数据类型2、字段访问一、结构化数据类型结构化数据类型可以被认为是一定长度的字节序列（结构的项大小），它被解释为字段的集合。每个字段在结构中都有一个名称、一个数据类型和一个字节偏移量。字段的数据类型可以是任何 numpy 数据类型可以使用函数 numpy.dtype 创建结构化数据类型，有 4 种替代形式的规范，它们的灵活性和简洁性各不相同：1.1、元组列表元组列表，每个字段一个元组。每个元组都具有 (fieldname, datatype, shape) 形式，其中 shape 是可选的np.dtype([('x', 'f4'), ('y', np.float32), ('z', 'f4', (2, 2))]) # dtype([('x', '<f4'), ('y', '<f4'), ('z', '<f4', (2, 2))]) #如果 fieldname 是空字符串 ''，则该字段将被赋予 f# 形式的默认名称 np.dtype([('x', 'f4'), ('', 'i4'), ('z', 'i8')]) # dtype([('x', '<f4'), ('f1', '<i4'), ('z', '<i8')])1.2、一串以逗号分隔的 dtype 规格在这个速记符号中，任何字符串 dtype 规范都可以在字符串中使用并用逗号分隔,并且字段名称被赋予默认名称 f0、f1 等np.dtype('i8, f4, S3') # dtype([('f0', '<i8'), ('f1', '<f4'), ('f2', 'S3')]) np.dtype('3int8, float32, (2, 3)float64') # dtype([('f0', 'i1', (3,)), ('f1', '<f4'), ('f2', '<f8', (2, 3))])1.3、字段参数数组字典该字典有两个必需键，“名称”和“格式”,‘names’ 和 ‘formats’ 的值应分别是相同长度的字段名称列表和 dtype 规范列表可选的“偏移”值应该是整数字节偏移列表，结构中的每个字段一个。如果未给出“偏移量”，则自动确定偏移量。可选的“itemsize”值应该是一个整数，以字节为单位描述 dtype 的总大小，它必须足够大以包含所有字段np.dtype({'names': ['col1', 'col2'], 'formats': ['i4', 'f4']}) # dtype([('col1', '<i4'), ('col2', '<f4')]) np.dtype({'names': ['col1', 'col2'], 'formats': ['i4', 'f4'], 'offsets': [0, 4], 'itemsize': 12}) # dtype({'names': ['col1', 'col2'], 'formats': ['<i4', '<f4'], # 'offsets': [0, 4], 'itemsize': 12})1.4、字段名称字典字典的键是字段名，值是指定类型和偏移量的元组：np.dtype({'col1': ('i1', 0), 'col2': ('f4', 1)}) # dtype([('col1', 'i1'), ('col2', '<f4')])二、字段访问如果 ndarray 对象是结构化数组，则可以通过使用字符串索引数组来访问数组的字段，类似于字典对结构化数组的索引也可以通过字段名称列表来完成，例如 x[['field-name1', 'field-name2']]如果访问的字段是子数组，则子数组的维度将附加到结果的形状中。例如：2.1、访问单个字段x = np.zeros((2, 2), dtype=[('a', np.int32), ('b', np.float64, (3, 3)) ]) x ''' array([[(0, [[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]]), (0, [[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]])], [(0, [[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]]), (0, [[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]])]], dtype=[('a', '<i4'), ('b', '<f8', (3, 3))]) ''' x['a'].shape # (2, 2) x['a'].dtype # dtype('int32') x['b'].shape # (2, 2, 3, 3) x['b'].dtype # dtype('float64') d = np.dtype([('x', 'i8'), ('y', 'f4')]) d.names #('x', 'y') d.fields2.2、访问多个字段可以索引并分配给具有多字段索引的结构化数组，其中索引是字段名称列表。a = np.zeros(3, dtype=[('a', 'i4'), ('b', 'i4'), ('c', 'f4')]) a[['a', 'c']] array([(0, 0.), (0, 0.), (0, 0.)], dtype={'names':['a','c'], 'formats':['<i4','<f4'], 'offsets':[0,8], 'itemsize':12})

浏览量2029

野生程序员在线

Day 07-进阶（01）

本节重点归一化随机数组生成矩阵乘法分桶寻找常见值相关性1 对一个5x5的随机矩阵做归一化提示: (x - min) / (max - min)答案：import numpy as np Z = np.random.randint(1,20, size = 25).reshape(5,5) print(Z) Zmax, Zmin = Z.max(), Z.min() Z = (Z - Zmin)/(Zmax - Zmin) print(Z)2. 如何创建包含5到10之间随机浮动的二维数组？描述：创建一个形状为5x3的二维数组，以包含5到10之间的随机十进制数。思路：1、randint 整数2、uniform 小数 :https://www.runoob.com/python/func-number-uniform.html#Solution Method 1: rand_arr = np.random.randint(low=5, high=10, size=(5,3)) + np.random.random((5,3)) print(rand_arr) #Solution Method 2: rand_arr = np.random.uniform(5,10, size=(5,3)) print(rand_arr) # > [[ 8.50061025 9.10531502 6.85867783] # > [ 9.76262069 9.87717411 7.13466701] # > [ 7.48966403 8.33409158 6.16808631] # > [ 7.75010551 9.94535696 5.27373226] # > [ 8.0850361 5.56165518 7.31244004]]3. 一个5x3的矩阵与一个3x2的矩阵相乘，实矩阵乘积是什么？提示: np.dot区别：np.multiply，np.dot答案：Z = np.dot(np.ones((5,3)), np.ones((3,2))) print(Z)4. 如何将数字转换为分类（文本）数组？问题：将iris_2d的花瓣长度（第3列）加入以形成文本数组，这样如果花瓣长度为：Less than 3 --> 'small'3-5 --> 'medium''>=5 --> 'large'给定：#Input url = '<https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data>' iris = np.genfromtxt(url, delimiter=',', dtype='object') names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species') '''答案：import numpy as np # Input url = '<https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data>' iris = np.genfromtxt(url, delimiter=',', dtype='object') names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species') #查看数据 iris.shape # Bin petallength petal_length_bin = np.digitize(iris[:, 2].astype('float'), [0, 3, 5, 10]) # Map it to respective category label_map = {1: 'small', 2: 'medium', 3: 'large', 4: np.nan} petal_length_cat = [label_map[x] for x in petal_length_bin] # View petal_length_cat[:4] # > ['small', 'small', 'small', 'small'] petal_length_cat 5. 如何在numpy数组中找到最常见的值？- 描述：在鸢尾属植物数据集中找到最常见的花瓣长度值（第3列）。**给定：**答案：# **给定：** url = '<https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data>' iris = np.genfromtxt(url, delimiter=',', dtype='object') # Solution: vals, counts = np.unique(iris[:, 2], return_counts=True) print(vals[np.argmax(counts)]) vals # > b'1.5'6. 如何找到numpy数组的两列之间的相关性？- 描述：在iris_2d中找出SepalLength（第1列）和PetalLength（第3列）之间的相关性**给定：**答案：# Input url = '<https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data>' iris = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3]) # Solution 1 #np.corrcoef : <https://blog.csdn.net/qq_39514033/article/details/88931639> cor = np.corrcoef(iris[:, 0], iris[:, 2]) print(cor) print(cor[0, 1]) # Solution 2 #输出：r：相关系数 [-1，1]之间，p-value: p值。 # 注： p值越小，表示相关系数越显著，一般p值在500个样本以上时有较高的可靠性。 #说明：<https://www.osgeo.cn/scipy/reference/generated/scipy.stats.pearsonr.html> from scipy.stats.stats import pearsonr corr, p_value = pearsonr(iris[:, 0], iris[:, 2]) print(corr,p_value)

浏览量2025

强哥喝假酒

请问numpy如何简化以下代码？

这绝对是我写过的最蠢的代码…… 这是一个3d模型的数据结构: me.vertices: [v0, v1, ...] me.edges: [e0, e1, ...] e0.vertices: [i0, i1] me.vertice[i0] == v0 v0.co: [x, y, z] 大致流程: * 遍历所有边 * 获取边的两个顶点座标想要的结果: [ [0, 0, 0], // 边 1 的第一个顶点座标 [1, 1, 1], // 边 1 的第二个顶点座标 [1, 1, 1], // 边 2 的第一个顶点座标 [2, 2, 2], // 边 2 的第二个顶点座标 ... ] len_edges = len(me.edges) verts = me.vertices len_verts = len_edges * 2 vs = np.zeros((len_verts * 3, ), dtype=np.float32, ) for i in range(len_edges): i0, i1 = me.edges[i].vertices v0 = verts[i0].co v1 = verts[i1].co s = i * 6 vs[s] = v0[0] vs[s + 1] = v0[1] vs[s + 2] = v0[2] vs[s + 3] = v1[0] vs[s + 4] = v1[1] vs[s + 5] = v1[2] vs.shape = (-1, 3, ) 以上代码功能正常，但太蠢了，目前api有一个语法糖: collection.foreach_get(attr, some_seq) # Python equivalent for i in range(len(seq)): some_seq[i] = getattr(collection[i], attr) 这么用就可以: vs = np.zeros((len_verts * 3, ), dtype=np.float32, ) me.vertices.foreach_get('co', vs) vs.shape = (-1, 3, ) 但我想不到该怎么改…… 感谢关注！

浏览量326

野生程序员在线

Day 08-进阶（02）

本节重点：softmax实现欧式距离计算概率抽样寻优（局部极大值）1. 如何计算Softmax得分？描述：计算sepallength的softmax分数。给定：url = '<https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data>' sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])答案：import numpy as np #Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') sepallength = np.array([float(row[0]) for row in iris]) #Solution def softmax(x): """Compute softmax values for each sets of scores in x. https://stackoverflow.com/questions/34968722/how-to-implement-the-softmax-function-in-python""" #np.exp = e^a #第一步：归一化 #第二部：求softmax e_x = np.exp(x - np.max(x)) return e_x / e_x.sum() print(sepallength) prob = softmax(sepallength) print(prob) print(np.sum(prob))2. 如何计算两个数组之间的欧氏距离？问题：计算两个数组a和数组b之间的欧氏距离。欧式距离：范数：答案：#函数说明：https://numpy.org/doc/stable/reference/generated/numpy.linalg.norm.html #Input a = np.array([1,2,3,4,5]) b = np.array([4,5,6,7,8]) #Solution1 范数 dist1 = np.linalg.norm(a-b) print(dist1) #> 6.7082039324993694 #Solution 2 dist2 = np.sqrt(np.sum(np.square(a - b))) print(dist2)3. 如何在numpy中进行概率抽样？描述：随机抽鸢尾属植物的种类，使得刚毛的数量是云芝和维吉尼亚的两倍给定：# Import iris keeping the text column intact url = '<https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data>' iris = np.genfromtxt(url, delimiter=',', dtype='object')思路：1、 random.choice : <https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html> 2、 np.linspace :<https://blog.csdn.net/You_are_my_dream/article/details/53493752> np.searchsorted : <https://blog.csdn.net/qq_33757398/article/details/89876088> 答案：#Import iris keeping the text column intact url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') #Solution #Get the species column species = iris[:, 4] #Approach 1: Generate Probablistically np.random.seed(100) a = np.array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']) species_out = np.random.choice(a, 150, p=[0.5, 0.25, 0.25]) print(np.unique(species_out,return_counts=True)) #Approach 2: Probablistic Sampling (preferred) np.random.seed(100) probs = np.r_[np.linspace(0, 0.500, num=50), np.linspace(0.501, .750, num=50), np.linspace(.751, 1.0, num=50)] print(probs) index = np.searchsorted(probs, np.random.random(150)) print(index) species_out = species[index] print(np.unique(species_out, return_counts=True)) #> (array([b'Iris-setosa', b'Iris-versicolor', b'Iris-virginica'], dtype=object), array([77, 37, 36]))4. 如何在一维数组中找到所有的局部极大值(或峰值)？描述：找到一个一维数字数组a中的所有峰值。峰顶是两边被较小数值包围的点。给定：a = np.array([1, 3, 7, 1, 2, 6, 0, 1])期望的输出：> array([2, 5]) 其中，2和5是峰值7和6的位置。答案：#np.diff :https://numpy.org/doc/stable/reference/generated/numpy.diff.html #np.sigh :https://blog.csdn.net/lyq_12/article/details/86645425 a = np.array([1, 3, 7, 1, 2, 6, 0, 1]) print(np.diff(a)) diff_sign = np.sign(np.diff(a)) print(diff_sign) doublediff = np.diff(diff_sign) print(doublediff) peak_locations = np.where(doublediff == -2)[0] + 1 peak_locations #> array([2, 5])

浏览量1271

野生程序员在线

Day 04-数组索引

本节内容：1、基本切片和索引2、数组索引3、布尔数组索引4、缺省索引5、索引数组与切片结合1、基本切片和索引基本切片（Basic slicing）将 Python 的基本概念扩展到 N 个维。当 obj 是一个切片对象（由方括号内的 start:stop:step 表示法构造），整数或切片对象和整数的元组时，会产生切片操作。1.1、单元素索引单元素索引的工作方式与其他标准 Python 序列的工作方式完全相同。它是基于 0 的，并接受从数组末尾开始索引的负索引x = np.arange(10) print(x) x[2] # 2 x[-2] # 8 x.shape = (2, 5) # 现在x是二维的 x ''' array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]) ''' x[1, 3] # 8 x[1, -1] # 9 x[0] # array([0, 1, 2, 3, 4]) x[0][2] # 21.2、切片和跨步1）基本的切片语法是 i:j:k，其中 i 是起始索引，j 是停止索引，k是步骤：x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) x[1:7:2] # array([1, 3, 5])2）负 i 和 j 被解释为 n+i 和 n+j，其中 n 是相应维度中的元素数。负k使步进指数变小。根据上述示例：x[-2:10] # array([8, 9]) x[-3:3:-1] # array([7, 6, 5, 4])3）假设n是要切片的维度中的元素数。然后，如果没有给i，k>0 时默认为0，k<0 时默认为 n-1。如果未给定j，则k>0时默认为n，k<0时默认为-n-1。如果未指定k，则默认为1。注意 ::与 : 相同，表示选择沿该轴的所有索引x[5:] # array([5, 6, 7, 8, 9])4）如果选择元组中的对象数小于 N，则 : 假定后继的任意维。例如：x = np.array([[[1],[2],[3]], [[4],[5],[6]]]) x.shape # (2, 3, 1) x[1:2] ''' array([[[4], [5], [6]]]) '''2、数组索引整数数组索引允许根据数组中的任意项的 N 维索引来选择它们。每个整数数组表示该维度中的多个索引。索引数组中允许出现负值，并且与处理单个索引或切片时一样x = np.arange(10, 1, -1) x # array([10, 9, 8, 7, 6, 5, 4, 3, 2]) x[np.array([3, 3, 1, 8])] # array([7, 7, 9, 2]) x[np.array([3, 3, -3, 8])] # array([7, 7, 4, 2])如果索引值超出界限，则引发 IndexError：x = np.array([[1, 2], [3, 4], [5, 6]]) x[np.array([1, -1])] ''' array([[3, 4], [5, 6]]) ''' x[np.array([3, 4])] ''' Traceback (most recent call last): ... IndexError: index 3 is out of bounds for axis 0 with size 3 '''3、布尔数组索引当obj是布尔类型的数组对象（可能从比较运算返回）时，会发生这种高级索引案例：可能希望从非NaN的数组中选择所有条目：x = np.array([[1., 2.], [np.nan, 3.], [np.nan, np.nan]]) x[~np.isnan(x)] # array([1., 2., 3.])或者希望为所有负元素添加一个常量：x = np.array([1., -1., -2., 3]) x[x < 0] += 20 x # array([ 1., 19., 18., 3.]) x = np.arange(35).reshape(5, 7) b = x > 20 b[:, 5] # array([False, False, False, True, True]) x[b[:, 5]] ''' array([[21, 22, 23, 24, 25, 26, 27], [28, 29, 30, 31, 32, 33, 34]]) '''4、缺省索引不完全索引是从多维数组的第一个维度获取索引或切片的一种方便方法a = np.arange(0, 100, 10) b = a[:5] c = a[a >= 50] print(b) # >>>[ 0 10 20 30 40] print(c) # >>>[50 60 70 80 90]5、索引数组与切片结合索引数组可以与切片组合。例如：>>> y[np.array([0, 2, 4]), 1:3] array([[ 1, 2], [15, 16], [29, 30]])实际上，切片和索引数组操作是独立的。切片操作提取索引为 1 和 2 的列（即第 2 和第 3 列），然后是索引数组操作提取索引为 0、2 和 4 的行（即第一、第三和第五行）。

浏览量1126

野生程序员在线

Day 03-数组创建

本节内容¶1、python list创建2、现有数据创建3、特殊值创建(重要）4、范围创建（重要）5、创建特定矩阵6、随机创建（重要）一、简单创建import numpy as np # 一维 x = np.array([[1, 2, 3], [4, 5, 6]], np.int32) print(type(x)) # <class 'numpy.ndarray'> print(x.shape) # (2, 3) print(x.dtype) # dtype('int32') print("===============") #python list创建 x = np.array([2, 3, 1, 0]) print(x)二、现有数据创建数组有以下常用的方法，可以将现有的各类数据创建一个数组（按 np.xxx 格式使用）# python list np.array([1, 2, 3]) #直接创建多维度 np.array([[1, 2], [3, 4]]) ''' array([[1, 2], [3, 4]]) ''' #指定维度 np.array([1, 2, 3], ndmin=2) # array([[1, 2, 3]]) #指定类型 np.array([1, 2, 3], dt='float16') #从子类创建 np.array(np.mat('1 2; 3 4')) ''' array([[1, 2], [3, 4]]) '''2.1、转为数组 numpy.asarray1、numpy.asarray(a, dtype=None, order=None) 将输入的类似列表的序列转换为数组。返回一个 ndarray2、如果输入已经是具有匹配 dtype 和 order 的 ndarray，则不执行复制3、如果 a 是 ndarray 的子类，则返回基类 ndarray。#将列表转换为数组： a = [1, 2] np.asarray(a) np.asarray(a) is a #数组未复制 #设置了 dtype，则仅当 dtype 不匹配时才复制数组： a = np.array([1, 2], dtype=np.float32) np.asarray(a, dtype=np.float32) is a # True np.asarray(a, dtype=np.float64) is a # False2.2、深拷贝 numpy.copy返回给定对象的数组副本，相当于 np.array(a, copy=True)。x = np.array([1, 2, 3]) y = x # 赋值 z = np.copy(x) # 深拷贝 x[0] = 0 # 修改数据 x # array([0, 2, 3]) y # 随 x 修改而修改 # array([0, 2, 3]) z # 深拷贝未随 x 修改而修改 # array([1, 2, 3])三、创建特殊值数组（重要）有以下常用的方法，可以创建一个矩阵内为特殊数字的数组（按 np.xxx 格式使用）。值全为 0 np.zeros(5) # array([ 0., 0., 0., 0., 0.]) np.zeros((5,), dtype=int) # array([0, 0, 0, 0, 0]) np.zeros((2, 1)) ''' array([[ 0.], [ 0.]]) ''' # 值全为 1 np.ones(5) # array([1., 1., 1., 1., 1.]) # 值全为指定填充值 np.full((2, 2), np.inf) ''' array([[inf, inf], [inf, inf]]) ''' np.full((2, 2), 10) ''' array([[10, 10], [10, 10]]) ''' np.full((2, 2), [1, 2]) ''' array([[1, 2], [1, 2]]) ''' #np.empty() 返回给定形状和类型的新数组，内容随机 np.empty([2, 3], dtype=int) ''' array([[0, 0, 1], [1, 1, 1]]) ''' #np.ones_like(a) 等按传入的数据（array_like）形状生成指定值的新数组： a = np.arange(6).reshape((2, 3)) a ''' array([[0, 1, 2], [3, 4, 5]]) ''' np.ones_like(a) ''' array([[1, 1, 1], [1, 1, 1]]) ''' #np.eye 返回对角线上值为 1，其他位置为 0 的数据： np.eye(3, dtype=int) ''' array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]) '''四、范围创建数组有以下常用的方法，可以根据给定的数据范围创建一个数组:4.1、范围均匀值 np.arangenp.arange 在给定的间隔内返回均匀分布的值。值是在半开区间 [start，stop) 内生成的#默认0-3 np.arange(3) # array([0, 1, 2]) np.arange(3,7) # array([3, 4, 5, 6]) np.arange(3,7,2) # array([3, 5]) np.arange(3,4,.2) # array([3. , 3.2, 3.4, 3.6, 3.8])4.2、等距数字 np.linspacenp.linspace() 返回指定间隔内的等距数字。返回在 [start，stop] 间隔内计算的等距采样数#指定数量 np.linspace(2.0, 3.0, num=5) # array([2. , 2.25, 2.5 , 2.75, 3. ]) # 右开区间（不包含右值） np.linspace(2.0, 3.0, num=5, endpoint=False) # array([2. , 2.2, 2.4, 2.6, 2.8])4.3、对数均匀分布 np.logspace同上np.logspace(2.0, 3.0, num=4) # array([ 100. , 215.443469 , 464.15888336, 1000. ]) np.logspace(2.0, 3.0, num=4, endpoint=False) # array([100. , 177.827941 , 316.22776602, 562.34132519]) np.logspace(2.0, 3.0, num=4, base=2.0) # array([4. , 5.0396842 , 6.34960421, 8. ])五、numpy构建规则矩阵有以下常用的方法，可以构建一定规则的矩阵:5.1、提取构造对角线数组 np.diag1、如果传入的是一个二维数组，提取出对角线的值形成一个一维数组，还可以传入参数 k 对对角线做下移和下移2、如果传入一个一维数组，则生成一个对角线数组，对角线上的值为一维数组的值。x = np.arange(9).reshape((3,3)) x ''' array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) ''' # 传入的是二维数组，提取出对角线上的值数组 np.diag(x) # array([0, 4, 8]) np.diag(x, k=1) # 上移 # array([1, 5]) np.diag(x, k=-1) # 下移 # array([3, 7]) # 将一维数组转为方阵，对角线为数组值 np.diag(np.arange(4)) ''' array([[0, 0, 0, 0], [0, 1, 0, 0], [0, 0, 2, 0], [0, 0, 0, 3]]) ''' 5.2、展开为对角线 np.diagflat将原数组（array_like）展平输入作为对角线二维数组的对角线的值。np.diagflat([[1,2], [3,4]]) ''' array([[1, 0, 0, 0], [0, 2, 0, 0], [0, 0, 3, 0], [0, 0, 0, 4]]) ''' # 设定 k 值，上移 1 位 np.diagflat([1,2], 1) ''' array([[0, 1, 0], [0, 0, 2], [0, 0, 0]]) '''5.3、三角矩阵 np.tri用于创建一个数组，该数组在给定对角线处和下方（在这种情况下为k）包含1，在数组的所有其他位置包含 0。# 上移 2 位 np.tri(3, 5, 2, dtype=int) ''' array([[1, 1, 1, 0, 0], [1, 1, 1, 1, 0], [1, 1, 1, 1, 1]]) ''' # 下移一位 np.tri(3, 5, -1) ''' array([[0., 0., 0., 0., 0.], [1., 0., 0., 0., 0.], [1., 1., 0., 0., 0.]]) '''六、NumPy 创建随机样本数组6.1、numpy.random.rand()numpy.random.rand(d0,d1...dn)rand函数根据给定维度生成半开区间[0,1)之间的数据，包含0，不包含1dn表示每个维度返回值为指定纬度的numpy.ndarraynp.random.rand(3, 3) # shape: 3*3 ''' array([[0.94340617, 0.96183216, 0.88510322], [0.44543261, 0.74930098, 0.73372814], [0.29233667, 0.3940114 , 0.7167332 ]]) ''' 6.2、np.random.randn()numpy.random.randn(d0,d1,…,dn)randn函数返回一个或一组样本，具有标准正态分布。dn表示个维度返回值为指定维度的numpy.ndarraynp.random.randn() # 当没有输入参数时，仅返回一个值 -0.7377941002942127 np.random.randn(3, 3) array([[-0.20565666, 1.23580939, -0.27814622], [ 0.53923344, -2.7092927 , 1.27514363], [ 0.38570597, -1.90564739, -0.10438987]])6.3、numpy.random.randint()numpy.random.randint(low, high=None, size=None, dtype=’l’)从区间[low,high）返回随机整形参数：low为最小值，high为最大值，size为数组维度大小，dtype为数据类型，默认的数据类型是np.inthigh没有填写时，默认生成随机数的范围是[0，low)np.random.randint(1, size = 10) # 返回[0, 1)之间的整数，所以只有0 #array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) np.random.randint(1, 5) # 返回[1, 5)之间随机的一个数字 #26.4、numpy.random.choice()(重要）numpy.random.choice(a, size=None, replace=True, p=None)从给定的一位数组中生成一个随机样本a要求输入一维数组类似数据或者是一个int；size是生成的数组纬度，要求数字或元组；replace为布尔型，决定样本是否有替换；p为样本出现概率

浏览量781