推荐 最新
七厦

为什么安装 cuda_12.2.2_535.104.05_linux.run 之后,还是没有 nvcc 命令?

我的机器是 nvidia T4 GPU + ubuntu22.04 我先通过下面的命令安装驱动 sudo apt install -y nvidia-driver-535-server 等电脑重启好了,输入 "nvidia-smi" 查看显卡信息 ╰─➤ nvidia-smi 130 ↵ Mon Sep 18 14:30:16 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Tesla T4 Off | 00000000:AF:00.0 Off | 0 | | N/A 47C P0 27W / 70W | 2MiB / 15360MiB | 6% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ 然后在 "https://developer.nvidia.com/cuda-downloads?target_os=Linux&t..." (https://link.segmentfault.com/?enc=3lX4HpTDwbVmYUepMwiEig%3D%3D.hVEvpxzxrZ5nbln27HtjYX%2FZgdAL9yh2fVEKf%2BzBsut2FoxAl2GIprcrELn%2BXB3k4tuTTUNgH4yajNzLt5aX7MPvOenGOadnQHI0WBRAFLmmG6vebB5O0RH%2BJQ6JGCmbZ8nOl2AYWSzYeYL1qUJqUZfS29AFy55ZR5t1WOE1DY7jLNViJlaGZUKzxbx8L4omQCX1vFuTS5EcMlgK8i5cwQ%3D%3D) 下面 CUDA Toolkit 12.2 "图片.png" (https://wmprod.oss-cn-shanghai.aliyuncs.com/images/20241227/df6ec5e1bab5f00a3335f3e1a32d40fb.png) ╭─pon@T4GPU ~/Downloads ╰─➤ sudo sh cuda_12.2.2_535.104.05_linux.run [sudo] password for pon: 安装之后,还是没有 nvcc ╭─pon@T4GPU ~/Downloads ╰─➤ nvcc --version 127 ↵ zsh: command not found: nvcc ╭─pon@T4GPU ~/Downloads ╰─➤ cd / 127 ↵ ╭─pon@T4GPU / ╰─➤ fd -a -u nvcc /usr/share/cmake-3.22/Modules/FindCUDA/run_nvcc.cmake 我的期望是,安装这个 CUDA Toolkit 之后,就有 nvcc 命令

14
1
0
浏览量508
时光旅人

现有两开关电源,分别为反激式(脉冲调制)、固定负载功率式(非脉冲调制),规定采样频率<500KHz,且两种电源功率同等(如200W)的情况下,在220V端 是否可以通过电压或电流波形区分上述两种电源?

问题背景: 现有两开关电源,分别为反激式(脉冲调制)、固定负载功率式(非脉冲调制), 脉冲调制型典型代表为:各类电池充电器,固定负载功率式典型代表为:射灯;这两种负 载分别接入市电。 问题描述: 规定采样频率<500KHz,且两种电源功率同等(如200W)的情况下,在220V端 是否可以通过电压或电流波形区分上述两种电源? 如果可以,请详细描述需要区分的特征点,如果不可以,是否可以通过其他途径 区分?

10
1
0
浏览量387
清晨我上码

Pytorch10天入门-day10-模型部署&推理

模型部署&推理模型部署模型推理我们会将PyTorch训练好的模型转换为ONNX 格式,然后使用ONNX Runtime运行它进行推理1、ONNXONNX( Open Neural Network Exchange) 是 Facebook (现Meta) 和微软在2017年共同发布的,用于标准描述计算图的一种格式。ONNX通过定义一组与环境和平台无关的标准格式,使AI模型可以在不同框架和环境下交互使用,ONNX可以看作深度学习框架和部署端的桥梁,就像编译器的中间语言一样由于各框架兼容性不一,我们通常只用 ONNX 表示更容易部署的静态图。硬件和软件厂商只需要基于ONNX标准优化模型性能,让所有兼容ONNX标准的框架受益ONNX主要关注在模型预测方面,使用不同框架训练的模型,转化为ONNX格式后,可以很容易的部署在兼容ONNX的运行环境中2、ONNX RuntimeONNX Runtime官网:https://www.onnxruntime.ai/ONNX Runtime GitHub:https://github.com/microsoft/onnxruntimeONNX Runtime 是由微软维护的一个跨平台机器学习推理加速器,它直接对接ONNX,可以直接读取.onnx文件并实现推理,不需要再把 .onnx 格式的文件转换成其他格式的文件PyTorch借助ONNX Runtime也完成了部署的最后一公里,构建了 PyTorch --> ONNX --> ONNX Runtime 部署流水线ONNX Runtime和CUDA之间的适配关系ONNX Runtime、TensorRT和CUDA的匹配关系:3、模型转换为ONNX格式用torch.onnx.export()把模型转换成 ONNX 格式的函数模型导成onnx格式前,我们必须调用model.eval()或者model.train(False)以确保我们的模型处在推理模式下import torch.onnx # 转换的onnx格式的名称,文件后缀需为.onnxonnx_file_name = "resnet50.onnx" # 我们需要转换的模型,将torch_model设置为自己的模型model = torchvision.models.resnet50(pretrained=True) # 加载权重,将model.pth转换为自己的模型权重model = model.load_state_dict(torch.load("resnet50.pt")) # 导出模型前,必须调用model.eval()或者model.train(False)model.eval() # dummy_input就是一个输入的实例,仅提供输入shape、type等信息 batch_size = 1 # 随机的取值,当设置dynamic_axes后影响不大dummy_input = torch.randn(batch_size, 3, 224, 224, requires_grad=True) # 这组输入对应的模型输出output = model(dummy_input) # 导出模型torch.onnx.export(model, # 模型的名称dummy_input, # 一组实例化输入onnx_file_name, # 文件保存路径/名称export_params=True, # 如果指定为True或默认, 参数也会被导出. 如果你要导出一个没训练过的就设为 False.opset_version=10, # ONNX 算子集的版本,当前已更新到15do_constant_folding=True, # 是否执行常量折叠优化input_names = ['conv1'], # 输入模型的张量的名称output_names = ['fc'], # 输出模型的张量的名称# dynamic_axes将batch_size的维度指定为动态, # 后续进行推理的数据可以与导出的dummy_input的batch_size不同dynamic_axes={'conv1' : {0 : 'batch_size'}, 'fc' : {0 : 'batch_size'}})ONNX模型的检验我们需要检测下我们的模型文件是否可用,我们将通过onnx.checker.check_model()进行检验import onnx # 我们可以使用异常处理的方法进行检验try: # 当我们的模型不可用时,将会报出异常onnx.checker.check_model(self.onnx_model) except onnx.checker.ValidationError as e: print("The model is invalid: %s"%e) else: # 模型可用时,将不会报出异常,并会输出“The model is valid!”print("The model is valid!")ONNX模型可视化使用netron做可视化。下载地址:https://netron.app/模型的输入&输出信息:使用ONNX Runtime进行推理使用ONNX Runtime进行推理import onnxruntime # 需要进行推理的onnx模型文件名称onnx_file_name= "xxxxxx.onnx" # onnxruntime.InferenceSession用于获取一个 ONNX Runtime 推理器ort_session= onnxruntime.InferenceSession(onnx_file_name, providers=['CPUExecutionProvider']) # session_fp32 = onnxruntime.InferenceSession("resnet50.onnx", providers=['CUDAExecutionProvider'])# session_fp32 = onnxruntime.InferenceSession("resnet50.onnx", providers=['OpenVINOExecutionProvider'])# 构建字典的输入数据,字典的key需要与我们构建onnx模型时的input_names相同# 输入的input_img 也需要改变为ndarray格式# ort_inputs = {'conv_1': input_img}#建议使用下面这种方法,因为避免了手动输入keyort_inputs= {ort_session.get_inputs()[0].name:input_img} # run是进行模型的推理,第一个参数为输出张量名的列表,一般情况可以设置为None# 第二个参数为构建的输入值的字典# 由于返回的结果被列表嵌套,因此我们需要进行[0]的索引ort_output= ort_session.run(None,ort_inputs)[0] # output = {ort_session.get_outputs()[0].name}# ort_output = ort_session.run([output], ort_inputs)[0]注意:PyTorch模型的输入为tensor,而ONNX的输入为array,因此我们需要对张量进行变换或者直接将数据读取为array格式输入的array的shape应该和我们导出模型的dummy_input的shape相同,如果图片大小不一样,我们应该先进行resize操作run的结果是一个列表,我们需要进行索引操作才能获得array格式的结果在构建输入的字典时,我们需要注意字典的key应与导出ONNX格式设置的input_name相同完整代码1. 安装&下载#!pip install onnx -i https://pypi.tuna.tsinghua.edu.cn/simple #!pip install onnxruntime -i https://pypi.tuna.tsinghua.edu.cn/simple #!pip install torch -i https://pypi.tuna.tsinghua.edu.cn/simple # Download ImageNet labels #!wget https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt2、定义模型import torch import io import time from PIL import Image import torchvision.transforms as transforms from torchvision import datasets import onnx import onnxruntime import torchvision import numpy as np from torch import nn import torch.nn.init as initonnx_file = 'resnet50.onnx' save_dir = './resnet50.pt'# 下载预训练模型 Resnet50 = torchvision.models.resnet50(pretrained=True) # 保存 模型权重 torch.save(Resnet50.state_dict(), save_dir) print(Resnet50)3. 模型导出为ONNX格式batch_size = 1 # just a random number # 先加载模型结构 loaded_model = torchvision.models.resnet50() # 在加载模型权重 loaded_model.load_state_dict(torch.load(save_dir)) #单卡GPU # loaded_model.cuda() # 将模型设置为推理模式 loaded_model.eval() # Input to the model x = torch.randn(batch_size, 3, 224, 224, requires_grad=True) torch_out = loaded_model(x) torch_out# 导出模型 torch.onnx.export(loaded_model, # model being run x, # model input (or a tuple for multiple inputs) onnx_file, # where to save the model (can be a file or file-like object) export_params=True, # store the trained parameter weights inside the model file opset_version=10, # the ONNX version to export the model to do_constant_folding=True, # whether to execute constant folding for optimization input_names = ['conv1'], # the model's input names output_names = ['fc'], # the model's output names # variable length axes dynamic_axes={'conv1' : {0 : 'batch_size'}, 'fc' : {0 : 'batch_size'}})============= Diagnostic Run torch.onnx.export version 2.0.0+cu117 ============= verbose: False, log level: Level.ERROR ======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================4、检验ONNX模型# 我们可以使用异常处理的方法进行检验 try: # 当我们的模型不可用时,将会报出异常 onnx.checker.check_model(onnx_file) except onnx.checker.ValidationError as e: print("The model is invalid: %s"%e) else: # 模型可用时,将不会报出异常,并会输出“The model is valid!” print("The model is valid!")5. 使用ONNX Runtime进行推理import onnxruntime import numpy as np ort_session = onnxruntime.InferenceSession(onnx_file, providers=['CPUExecutionProvider']) # 将张量转化为ndarray格式 def to_numpy(tensor): return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy() # 构建输入的字典和计算输出结果 ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(x)} ort_outs = ort_session.run(None, ort_inputs) # 比较使用PyTorch和ONNX Runtime得出的精度 np.testing.assert_allclose(to_numpy(torch_out), ort_outs[0], rtol=1e-03, atol=1e-05) print("Exported model has been tested with ONNXRuntime, and the result looks good!")6. 进行实际预测并可视化# 推理数据 from PIL import Image from torchvision.transforms import transforms # 生成推理图片 image = Image.open('./images/cat.jpg') # 将图像调整为指定大小 image = image.resize((224, 224)) # 将图像转换为 RGB 模式 image = image.convert('RGB') image.save('./images/cat_224.jpg')categories = [] # Read the categories with open("./imagenet/imagenet_classes.txt", "r") as f: categories = [s.strip() for s in f.readlines()] def get_class_name(probabilities): # Show top categories per image top5_prob, top5_catid = torch.topk(probabilities, 5) for i in range(top5_prob.size(0)): print(categories[top5_catid[i]], top5_prob[i].item())#预处理 def pre_image(image_file): input_image = Image.open(image_file) preprocess = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ]) input_tensor = preprocess(input_image) inputs = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model # input_arr = inputs.cpu().detach().numpy() return inputs#inference with model # 先加载模型结构 resnet50 = torchvision.models.resnet50() # 在加载模型权重 resnet50.load_state_dict(torch.load(save_dir)) resnet50.eval() #推理 input_batch = pre_image('./images/cat_224.jpg') # move the input and model to GPU for speed if available print("GPU Availability: ", torch.cuda.is_available()) if torch.cuda.is_available(): input_batch = input_batch.to('cuda') resnet50.to('cuda') with torch.no_grad(): output = resnet50(input_batch) # Tensor of shape 1000, with confidence scores over Imagenet's 1000 classes # print(output[0]) # The output has unnormalized scores. To get probabilities, you can run a softmax on it. probabilities = torch.nn.functional.softmax(output[0], dim=0) get_class_name(probabilities)GPU Availability: False Persian cat 0.6668420433998108 lynx 0.023987364023923874 bow tie 0.016234245151281357 hair slide 0.013150070793926716 Japanese spaniel 0.012279157526791096#benchmark 性能 latency = [] for i in range(10): with torch.no_grad(): start = time.time() output = resnet50(input_batch) probabilities = torch.nn.functional.softmax(output[0], dim=0) top5_prob, top5_catid = torch.topk(probabilities, 5) # for catid in range(top5_catid.size(0)): # print(categories[catid]) latency.append(time.time() - start) print("{} model inference CPU time:cost {} ms".format(str(i),format(sum(latency) * 1000 / len(latency), '.2f')))0 model inference CPU time:cost 149.59 ms 1 model inference CPU time:cost 130.74 ms 2 model inference CPU time:cost 133.76 ms 3 model inference CPU time:cost 130.64 ms 4 model inference CPU time:cost 131.72 ms 5 model inference CPU time:cost 130.88 ms 6 model inference CPU time:cost 136.31 ms 7 model inference CPU time:cost 139.95 ms 8 model inference CPU time:cost 141.90 ms 9 model inference CPU time:cost 140.96 ms# Inference with ONNX Runtime import onnxruntime from onnx import numpy_helper import time onnx_file = 'resnet50.onnx' session_fp32 = onnxruntime.InferenceSession(onnx_file, providers=['CPUExecutionProvider']) # session_fp32 = onnxruntime.InferenceSession("resnet50.onnx", providers=['CUDAExecutionProvider']) # session_fp32 = onnxruntime.InferenceSession("resnet50.onnx", providers=['OpenVINOExecutionProvider']) def softmax(x): """Compute softmax values for each sets of scores in x.""" e_x = np.exp(x - np.max(x)) return e_x / e_x.sum() latency = [] def run_sample(session, categories, inputs): start = time.time() input_arr = inputs ort_outputs = session.run([], {'conv1':input_arr})[0] output = ort_outputs.flatten() output = softmax(output) # this is optional top5_catid = np.argsort(-output)[:5] # for catid in top5_catid: # print(categories[catid]) latency.append(time.time() - start) return ort_outputsinput_tensor = pre_image('./images/cat_224.jpg') input_arr = input_tensor.cpu().detach().numpy() for i in range(10): ort_output = run_sample(session_fp32, categories, input_arr) print("{} ONNX Runtime CPU Inference time = {} ms".format(str(i),format(sum(latency) * 1000 / len(latency), '.2f')))0 ONNX Runtime CPU Inference time = 67.66 ms 1 ONNX Runtime CPU Inference time = 56.30 ms 2 ONNX Runtime CPU Inference time = 53.90 ms 3 ONNX Runtime CPU Inference time = 58.18 ms 4 ONNX Runtime CPU Inference time = 64.53 ms 5 ONNX Runtime CPU Inference time = 62.79 ms 6 ONNX Runtime CPU Inference time = 61.75 ms 7 ONNX Runtime CPU Inference time = 60.51 ms 8 ONNX Runtime CPU Inference time = 59.35 ms 9 ONNX Runtime CPU Inference time = 57.57 ms4、扩展知识模型量化模型剪裁工程优化算子优化

0
0
0
浏览量2045
清晨我上码

Pytorch10天入门-day07-模型保存与读取

PyTorch 模型保存&读取模型存储模型单卡存储&多卡存储模型单卡读取&多卡读取1、模型存储PyTorch存储模型主要采用pkl,pt,pth三种格式,就使用层面来说没有区别PyTorch模型主要包含两个部分:模型结构和权重。其中模型是继承nn.Module的类,权重的数据结构是一个字典(key是层名,value是权重向量)存储也由此分为两种形式:存储整个模型(包括结构和权重)和只存储模型权重(推荐)。import torch from torchvision import models model = models.resnet50(pretrained=True) save_dir = './resnet50.pth' # 保存整个 模型结构+权重 torch.save(model, save_dir) # 保存 模型权重 torch.save(model.state_dict, save_dir) # pt, pth和pkl三种数据格式均支持模型权重和整个模型的存储2、模型单卡存储&多卡存储PyTorch中将模型和数据放到GPU上有两种方式——.cuda()和.to(device)注:如果要使用多卡训练的话,需要对模型使用torch.nn.DataParallel2.1、nn.DataParrallel<CLASS torch.nn.DataParallel(module, device_ids=None, output_device=None, dim=0)>module即表示你定义的模型device_ids表示你训练的deviceoutput_device这个参数表示输出结果的device,而这最后一个参数output_device一般情况下是省略不写的,那么默认就是在device_ids[0]注:因此一般情况下第一张显卡的内存使用占比会更多import os import torch from torchvision import models#单卡 os.environ['CUDA_VISIBLE_DEVICES'] = '0' # 如果是多卡改成类似0,1,2 model = model.cuda() # 单卡 #print(model)#多卡 os.environ['CUDA_VISIBLE_DEVICES'] = '0,1' model = torch.nn.DataParallel(model).cuda() # 多卡 #print(model)2.3、单卡保存+单卡加载os.environ['CUDA_VISIBLE_DEVICES'] = '0' #这里替换成希望使用的GPU编号 model = models.resnet50(pretrained=True) model.cuda() save_dir = 'resnet50.pt' #保存路径 # 保存+读取整个模型 torch.save(model, save_dir) loaded_model = torch.load(save_dir) loaded_model.cuda() # 保存+读取模型权重 torch.save(model.state_dict(), save_dir) # 先加载模型结构 loaded_model = models.resnet50() # 在加载模型权重 loaded_model.load_state_dict(torch.load(save_dir)) loaded_model.cuda()2.4、单卡保存+多卡加载os.environ['CUDA_VISIBLE_DEVICES'] = '0' #这里替换成希望使用的GPU编号 model = models.resnet50(pretrained=True) model.cuda() # 保存+读取整个模型 torch.save(model, save_dir) os.environ['CUDA_VISIBLE_DEVICES'] = '1,2' #这里替换成希望使用的GPU编号 loaded_model = torch.load(save_dir) loaded_model = nn.DataParallel(loaded_model).cuda() # 保存+读取模型权重 torch.save(model.state_dict(), save_dir) os.environ['CUDA_VISIBLE_DEVICES'] = '1,2' #这里替换成希望使用的GPU编号 loaded_model = models.resnet50() #注意这里需要对模型结构有定义 loaded_model.load_state_dict(torch.load(save_dir)) loaded_model = nn.DataParallel(loaded_model).cuda()2.5、多卡保存+单卡加载核心问题:如何去掉权重字典键名中的"module",以保证模型的统一性对于加载整个模型,直接提取模型的module属性即可对于加载模型权重,保存模型时保存模型的module属性对应的权重os.environ['CUDA_VISIBLE_DEVICES'] = '1,2' #这里替换成希望使用的GPU编号 model = models.resnet50(pretrained=True) model = nn.DataParallel(model).cuda() # 保存+读取整个模型 torch.save(model, save_dir) os.environ['CUDA_VISIBLE_DEVICES'] = '0' #这里替换成希望使用的GPU编号 loaded_model = torch.load(save_dir).moduleos.environ['CUDA_VISIBLE_DEVICES'] = '0,1,2' #这里替换成希望使用的GPU编号 model = models.resnet50(pretrained=True) model = nn.DataParallel(model).cuda() # 保存权重 torch.save(model.module.state_dict(), save_dir) #加载模型权重 os.environ['CUDA_VISIBLE_DEVICES'] = '0' #这里替换成希望使用的GPU编号 loaded_model = models.resnet50() #注意这里需要对模型结构有定义 loaded_model.load_state_dict(torch.load(save_dir)) loaded_model.cuda()2.6、多卡保存+多卡加载保存整个模型时会同时保存所使用的GPU id等信息,读取时若这些信息和当前使用的GPU信息不符则可能会报错或者程序不按预定状态运行。可能出现以下2个问题:1、读取整个模型再使用nn.DataParallel进行分布式训练设置,这种情况很可能会造成保存的整个模型中GPU id和读取环境下设置的GPU id不符,训练时数据所在device和模型所在device不一致而报错2、读取整个模型而不使用nn.DataParallel进行分布式训练设置,发现程序会自动使用设备的前n个GPU进行训练(n是保存的模型使用的GPU个数)。此时如果指定的GPU个数少于n,则会报错建议方案:只模型权重,之后再使用nn.DataParallel进行分布式训练设置则没有问题因此多卡模式下建议使用权重的方式存储和读取模型os.environ['CUDA_VISIBLE_DEVICES'] = '0,1,2' #这里替换成希望使用的GPU编号 model = models.resnet50(pretrained=True) model = nn.DataParallel(model).cuda() # 保存+读取模型权重,强烈建议!! torch.save(model.state_dict(), save_dir) #加载模型 权重 loaded_model = models.resnet50() #注意这里需要对模型结构有定义 loaded_model.load_state_dict(torch.load(save_dir))) loaded_model = nn.DataParallel(loaded_model).cuda()建议不管是单卡保存还是多卡保存,建议以保存模型权重为主不管是单卡还是多卡,先load模型权重,再指定是多卡加载(nn.DataParallel)或单卡(cuda)# 使用案例(截取片段代码) My_model.eval() test_total_loss = 0 test_total_correct = 0 test_total_num = 0 past_test_loss = 0 #上一轮的loss save_model_step = 10 # 每10步保存一次model for iter,(images,labels) in enumerate(test_loader): images = images.to(device) labels = labels.to(device) outputs = My_model(images) loss = criterion(outputs,labels) test_total_correct += (outputs.argmax(1) == labels).sum().item() test_total_loss += loss.item() test_total_num += labels.shape[0] test_loss = test_total_loss / test_total_num print("Epoch [{}/{}], train_loss:{:.4f}, train_acc:{:.4f}%, test_loss:{:.4f}, test_acc:{:.4f}%".format( i+1, epoch, train_total_loss / train_total_num, train_total_correct / train_total_num * 100, test_total_loss / test_total_num, test_total_correct / test_total_num * 100 )) # model save if test_loss<past_test_loss: #保存模型权重 torch.save(model.state_dict(), save_dir) #保存 模型权重+模型结构 #torch.save(model, save_dir) if iter % save_model_step == 0: #保存模型权重 torch.save(model.state_dict(), save_dir) #保存 模型权重+模型结构 #torch.save(model, save_dir) past_test_loss = test_loss

0
0
0
浏览量2030
清晨我上码

Pytorch10天入门-day05-可视化

PyTorch 可视化1、模型结构可视化2、训练过程可视化3、模型评估可视化#导入常用包 import os import numpy as np import torch from torch import nn from torch.utils.data import Dataset, DataLoader from torchvision.transforms import transforms import torchvisionimport torch.nn.functional as F # 自定义model class DemoModel(nn.Module): def __init__(self): super(DemoModel, self).__init__() self.conv1 = nn.Conv2d(3, 6, 5) self.pool = nn.MaxPool2d(2, 2) self.conv2 = nn.Conv2d(6, 16, 5) self.fc1 = nn.Linear(16 * 5 * 5, 120) self.fc2 = nn.Linear(120, 84) self.fc3 = nn.Linear(84, 10) def forward(self, x): x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) x = x.view(-1, 16 * 5 * 5) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return xmodel = DemoModel() #方法一:print打印(模型结构可视化) print(model)DemoModel( (conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1)) (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1)) (fc1): Linear(in_features=400, out_features=120, bias=True) (fc2): Linear(in_features=120, out_features=84, bias=True) (fc3): Linear(in_features=84, out_features=10, bias=True) )torchinfo#pip3 install torchinfotrochinfo的使用也是十分简单,我们只需要使用torchinfo.summary()就行了,必需的参数分别是model,input_size[batch_size,channel,h,w]提供了模块信息(每一层的类型、输出shape和参数量)、模型整体的参数量、模型大小、一次前向或者反向传播需要的内存大小等from torchinfo import summary model = DemoModel() # 实例化模型 #方法二:torchinfo 查看 模型结构可视化 summary(model, (1, 3, 32, 32)) # 1:batch_size 3:图片的通道数 1024: 图片的高宽========================================================================================== Layer (type:depth-idx) Output Shape Param # ========================================================================================== DemoModel [1, 10] -- ├─Conv2d: 1-1 [1, 6, 28, 28] 456 ├─MaxPool2d: 1-2 [1, 6, 14, 14] -- ├─Conv2d: 1-3 [1, 16, 10, 10] 2,416 ├─MaxPool2d: 1-4 [1, 16, 5, 5] -- ├─Linear: 1-5 [1, 120] 48,120 ├─Linear: 1-6 [1, 84] 10,164 ├─Linear: 1-7 [1, 10] 850 ========================================================================================== Total params: 62,006 Trainable params: 62,006 Non-trainable params: 0 Total mult-adds (M): 0.66 ========================================================================================== Input size (MB): 0.01 Forward/backward pass size (MB): 0.05 Params size (MB): 0.25 Estimated Total Size (MB): 0.31 ==========================================================================================TensorBoardTensorBoard作为一款可视化工具能够满足 输入数据(尤其是图片)、模型结构、参数分布、debug的需求TensorBoard可以记录我们指定的数据,包括模型每一层的feature map,权重,以及训练loss等等利用TensorBoard实现训练过程可视化安装pip3 install tensorboard启动tensorboardtensorboard --logdir=/path/to/logs/ --port=xxxx其中“path/to/logs/"是指定的保存tensorboard记录结果的文件路径,等价于上面的“./runs"port是外部访问TensorBoard的端口号,可以通过访问ip:port访问tensorboard)# from tensorboard import SummaryWriter from torch.utils.tensorboard import SummaryWriter writer = SummaryWriter('./runs') #方法三:tensorboard查看 writer.add_graph(model,torch.rand(1, 3, 32, 32)) writer.close()tensorboard 可视图#超参数定义 # 批次的大小 batch_size = 16 #可选32、64、128 # 优化器的学习率 lr = 1e-4 #运行epoch max_epochs = 2 # 方案一:指定GPU的方式 # os.environ['CUDA_VISIBLE_DEVICES'] = '0,1' # 指明调用的GPU为0,1号 # 方案二:使用“device”,后续对要使用GPU的变量用.to(device)即可 # device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") # 指明调用的GPU为1号 # 数据读取 #cifar10数据集为例给出构建Dataset类的方式 from torchvision import datasets #“data_transform”可以对图像进行一定的变换,如翻转、裁剪、归一化等操作,可自己定义 data_transform=transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5)) ]) train_cifar_dataset = datasets.CIFAR10('cifar10',train=True, download=False,transform=data_transform) test_cifar_dataset = datasets.CIFAR10('cifar10',train=False, download=False,transform=data_transform) #构建好Dataset后,就可以使用DataLoader来按批次读入数据了 train_loader = torch.utils.data.DataLoader(train_cifar_dataset, batch_size=batch_size, num_workers=4, shuffle=True, drop_last=True) test_loader = torch.utils.data.DataLoader(test_cifar_dataset, batch_size=batch_size, num_workers=4, shuffle=False)#训练&验证 writer = SummaryWriter('./runs') # Set fixed random number seed torch.manual_seed(42) # 定义损失函数和优化器 device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') My_model = DemoModel() My_model = My_model.to(device) # 交叉熵 criterion = torch.nn.CrossEntropyLoss() # 优化器 optimizer = torch.optim.Adam(My_model.parameters(), lr=lr) epoch = max_epochs total_step = len(train_loader) train_all_loss = [] test_all_loss = [] for i in range(epoch): My_model.train() train_total_loss = 0 train_total_num = 0 train_total_correct = 0 for iter, (images,labels) in enumerate(train_loader): images = images.to(device) labels = labels.to(device) # Write the network graph at epoch 0, batch 0 if epoch == 0 and iter == 0: writer.add_graph(My_model, input_to_model=(images,labels)[0], verbose=True) # Write an image at every batch 0 if iter == 0: writer.add_image("Example input", images[0], global_step=epoch) outputs = My_model(images) loss = criterion(outputs,labels) train_total_correct += (outputs.argmax(1) == labels).sum().item() #backword optimizer.zero_grad() loss.backward() optimizer.step() train_total_num += labels.shape[0] train_total_loss += loss.item() # Print statistics writer.add_scalar("Loss/Minibatches", train_total_loss, train_total_num) print("Epoch [{}/{}], Iter [{}/{}], train_loss:{:4f}".format(i+1,epoch,iter+1,total_step,loss.item()/labels.shape[0])) # Write loss for epoch writer.add_scalar("Loss/Epochs", train_total_loss, epoch) My_model.eval() test_total_loss = 0 test_total_correct = 0 test_total_num = 0 for iter,(images,labels) in enumerate(test_loader): images = images.to(device) labels = labels.to(device) outputs = My_model(images) loss = criterion(outputs,labels) test_total_correct += (outputs.argmax(1) == labels).sum().item() test_total_loss += loss.item() test_total_num += labels.shape[0] print("Epoch [{}/{}], train_loss:{:.4f}, train_acc:{:.4f}%, test_loss:{:.4f}, test_acc:{:.4f}%".format( i+1, epoch, train_total_loss / train_total_num, train_total_correct / train_total_num * 100, test_total_loss / test_total_num, test_total_correct / test_total_num * 100 )) train_all_loss.append(np.round(train_total_loss / train_total_num,4)) test_all_loss.append(np.round(test_total_loss / test_total_num,4))lossgraph

0
0
0
浏览量2038
清晨我上码

Pytorch10天入门-day04-模型构建

PyTorch 模型构建1、GPU配置2、数据预处理3、划分训练集、验证集、测试集4、选择模型5、设定损失函数&优化方法6、模型效果评估本节主要讲4、5部分#导入常用包 import os import numpy as np import torch from torch.utils.data import Dataset, DataLoader from torchvision.transforms import transforms#超参数定义 # 批次的大小 batch_size = 16 #可选32、64、128 # 优化器的学习率 lr = 1e-4 #运行epoch max_epochs = 10 # 方案一:指定GPU的方式 os.environ['CUDA_VISIBLE_DEVICES'] = '0,1' # 指明调用的GPU为0,1号 # 方案二:使用“device”,后续对要使用GPU的变量用.to(device)即可 device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu") # 指明调用的GPU为1号# 数据读取 #cifar10数据集为例给出构建Dataset类的方式 from torchvision import datasets #“data_transform”可以对图像进行一定的变换,如翻转、裁剪、归一化等操作,可自己定义 data_transform=transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5)) ]) train_cifar_dataset = datasets.CIFAR10('cifar10',train=True, download=False,transform=data_transform) test_cifar_dataset = datasets.CIFAR10('cifar10',train=False, download=False,transform=data_transform) #构建好Dataset后,就可以使用DataLoader来按批次读入数据了 train_loader = torch.utils.data.DataLoader(train_cifar_dataset, batch_size=batch_size, num_workers=4, shuffle=True, drop_last=True) test_loader = torch.utils.data.DataLoader(test_cifar_dataset, batch_size=batch_size, num_workers=4, shuffle=False)#定义模型 # 方法一:预训练模型 import torchvision Resnet50 = torchvision.models.resnet50(pretrained=True) Resnet50.fc.out_features=10 print(Resnet50)#训练&验证 # 定义损失函数和优化器 device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') # 损失函数:交叉熵 criterion = torch.nn.CrossEntropyLoss() # 优化器 optimizer = torch.optim.Adam(Resnet50.parameters(), lr=lr) epoch = max_epochs Resnet50 = Resnet50.to(device) total_step = len(train_loader) train_all_loss = [] val_all_loss = [] for i in range(epoch): Resnet50.train() train_total_loss = 0 train_total_num = 0 train_total_correct = 0 for iter, (images,labels) in enumerate(train_loader): images = images.to(device) labels = labels.to(device) outputs = Resnet50(images) loss = criterion(outputs,labels) train_total_correct += (outputs.argmax(1) == labels).sum().item() #backword optimizer.zero_grad() loss.backward() optimizer.step() train_total_num += labels.shape[0] train_total_loss += loss.item() print("Epoch [{}/{}], Iter [{}/{}], train_loss:{:4f}".format(i+1,epoch,iter+1,total_step,loss.item()/labels.shape[0])) Resnet50.eval() test_total_loss = 0 test_total_correct = 0 test_total_num = 0 for iter,(images,labels) in enumerate(test_loader): images = images.to(device) labels = labels.to(device) outputs = Resnet50(images) loss = criterion(outputs,labels) test_total_correct += (outputs.argmax(1) == labels).sum().item() test_total_loss += loss.item() test_total_num += labels.shape[0] print("Epoch [{}/{}], train_loss:{:.4f}, train_acc:{:.4f}%, test_loss:{:.4f}, test_acc:{:.4f}%".format( i+1, epoch, train_total_loss / train_total_num, train_total_correct / train_total_num * 100, test_total_loss / test_total_num, test_total_correct / test_total_num * 100 )) train_all_loss.append(np.round(train_total_loss / train_total_num,4)) test_all_loss.append(np.round(test_total_loss / test_total_num,4))# 方法二:自定义model class DemoModel(nn.Module): def __init__(self): super(DemoModel, self).__init__() self.conv1 = nn.Conv2d(3, 6, 5) self.pool = nn.MaxPool2d(2, 2) self.conv2 = nn.Conv2d(6, 16, 5) self.fc1 = nn.Linear(16 * 5 * 5, 120) self.fc2 = nn.Linear(120, 84) self.fc3 = nn.Linear(84, 10) def forward(self, x): x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) x = x.view(-1, 16 * 5 * 5) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x#训练&验证 # 定义损失函数和优化器 device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') # 交叉熵 criterion = torch.nn.CrossEntropyLoss() # 优化器 optimizer = torch.optim.Adam(Resnet50.parameters(), lr=lr) epoch = max_epochs My_model = DemoModel() My_model = My_model.to(device) total_step = len(train_loader) train_all_loss = [] val_all_loss = [] for i in range(epoch): My_model.train() train_total_loss = 0 train_total_num = 0 train_total_correct = 0 for iter, (images,labels) in enumerate(train_loader): images = images.to(device) labels = labels.to(device) outputs = My_model(images) loss = criterion(outputs,labels) train_total_correct += (outputs.argmax(1) == labels).sum().item() #backword optimizer.zero_grad() loss.backward() optimizer.step() train_total_num += labels.shape[0] train_total_loss += loss.item() print("Epoch [{}/{}], Iter [{}/{}], train_loss:{:4f}".format(i+1,epoch,iter+1,total_step,loss.item()/labels.shape[0])) My_model.eval() test_total_loss = 0 test_total_correct = 0 test_total_num = 0 for iter,(images,labels) in enumerate(test_loader): images = images.to(device) labels = labels.to(device) outputs = My_model(images) loss = criterion(outputs,labels) test_total_correct += (outputs.argmax(1) == labels).sum().item() test_total_loss += loss.item() test_total_num += labels.shape[0] print("Epoch [{}/{}], train_loss:{:.4f}, train_acc:{:.4f}%, test_loss:{:.4f}, test_acc:{:.4f}%".format( i+1, epoch, train_total_loss / train_total_num, train_total_correct / train_total_num * 100, test_total_loss / test_total_num, test_total_correct / test_total_num * 100 )) train_all_loss.append(np.round(train_total_loss / train_total_num,4)) test_all_loss.append(np.round(test_total_loss / test_total_num,4))

0
0
0
浏览量2039
清晨我上码

PyTorch10天入门-day01-基础入门

Pytorch介绍由Facebook团队开发下载安装 :https://pytorch.org/get-started/locally/安装Pytorch 2.0本节重点:tensor是什么tensor四则运算tensor广播import torch import numpy as np # 张量tensor 随机初始化 x = torch.rand(4,3) print(x) y =torch.randn(4,3) print(y)tensor([[0.9480, 0.9501, 0.2717], [0.8003, 0.0821, 0.6529], [0.3265, 0.4726, 0.6464], [0.9685, 0.5453, 0.2186]]) tensor([[-0.5172, -0.1762, -1.0094], [ 0.1688, -1.6217, -0.8422], [-0.4597, -0.5814, -1.3831], [ 0.1718, 0.2061, 1.0907]])# 初始化全零 张量 a = torch.zeros((4,4),dtype=torch.long) print(a) #初始化全一 张量 b = torch.ones(4,4) print(b) c = torch.tensor(np.ones((2,3),dtype='int32')) print(c)tensor([[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]) tensor([[1., 1., 1., 1.], [1., 1., 1., 1.], [1., 1., 1., 1.], [1., 1., 1., 1.]]) tensor([[1, 1, 1], [1, 1, 1]], dtype=torch.int32)常见构造Tensor的方法:# tensor 的基本操作 # 加法 print(a+b) # add_ = replace in 操作 y = a.add_(3) print(y)tensor([[1., 1., 1., 1.], [1., 1., 1., 1.], [1., 1., 1., 1.], [1., 1., 1., 1.]]) tensor([[3, 3, 3, 3], [3, 3, 3, 3], [3, 3, 3, 3], [3, 3, 3, 3]])#索引操作 x = torch.rand(3,4) print(x) # 第二列 print(x[:,1]) # 第二行 print(x[1,:])tensor([[-0.0617, 2.3109, 0.0030, 0.6941], [ 0.4677, -1.9160, 0.6614, -1.7743], [-0.3349, 0.2371, 2.1070, -1.0076], [ 0.3823, -1.2401, -0.3766, -1.0454]]) tensor([[-0.0617, 2.3109, 0.0030, 0.6941, 0.4677, -1.9160, 0.6614, -1.7743], [-0.3349, 0.2371, 2.1070, -1.0076, 0.3823, -1.2401, -0.3766, -1.0454]])#广播机制 #当对两个形状不同的 Tensor 按元素运算时,可能会触发广播(broadcasting)机制:先适当复制元素使这两个 Tensor 形状相同后再按元素运算。 x = torch.arange(1,4).view(1,3) print(x) y = torch.arange(1,5).view(4,1) print(y) print(x+y)tensor([[1, 2, 3]]) tensor([[1], [2], [3], [4]]) tensor([[2, 3, 4], [3, 4, 5], [4, 5, 6], [5, 6, 7]])

0
0
0
浏览量2046
清晨我上码

Pytorch10天入门-day06-复杂模型构建

1、PyTorch 复杂模型构建1、模型截图2、模型部件实现3、模型组装2、模型定义2.1、Sequential1、当模型的前向计算为简单串联各个层的计算时, Sequential 类可以通过更加简单的方式定义模型。2、可以接收一个子模块的有序字典(OrderedDict) 或者一系列子模块作为参数来逐一添加 Module 的实例,模型的前向计算就是将这些实例按添加的顺序逐⼀计算3、使用Sequential定义模型的好处在于简单、易读,同时使用Sequential定义的模型不需要再写forwardimport torch.nn as nn net = nn.Sequential( nn.Linear(784, 256), nn.ReLU(), nn.Linear(256, 10), ) print(net)Sequential( (0): Linear(in_features=784, out_features=256, bias=True) (1): ReLU() (2): Linear(in_features=256, out_features=10, bias=True) )import collections import torch.nn as nn net2 = nn.Sequential(collections.OrderedDict([ ('fc1', nn.Linear(784, 256)), ('relu1', nn.ReLU()), ('fc2', nn.Linear(256, 10)) ])) print(net2)Sequential( (fc1): Linear(in_features=784, out_features=256, bias=True) (relu1): ReLU() (fc2): Linear(in_features=256, out_features=10, bias=True) )2.2、ModuleListModuleList 接收一个子模块(或层,需属于nn.Module类)的列表作为输入,然后也可以类似List那样进行append和extend操作nn.ModuleList 并没有定义一个网络,它只是将不同的模块储存在一起。ModuleList中元素的先后顺序并不代表其在网络中的真实位置顺序net = nn.ModuleList([nn.Linear(784, 256), nn.ReLU()]) net.append(nn.Linear(256, 10)) # # 类似List的append操作 print(net[-1]) # 类似List的索引访问 print(net)Linear(in_features=256, out_features=10, bias=True) ModuleList( (0): Linear(in_features=784, out_features=256, bias=True) (1): ReLU() (2): Linear(in_features=256, out_features=10, bias=True) )2.3、ModuleDictModuleList 接收一个子模块(或层,需属于nn.Module类)的列表作为输入,然后也可以类似List那样进行append和extend操作增加子模块或层的同时权重也会自动添加到网络中来net = nn.ModuleDict({ 'linear': nn.Linear(784, 256), 'act': nn.ReLU(), }) net['output'] = nn.Linear(256, 10) # 添加 print(net['linear']) # 访问 print(net.output) print(net)net = nn.ModuleDict({ 'linear': nn.Linear(784, 256), 'act': nn.ReLU(), }) net['output'] = nn.Linear(256, 10) # 添加 print(net['linear']) # 访问 print(net.output) print(net)3、手搓Restnet503.1、Restnet50resnet 在imageNet竞赛中分类任务第一名、目标检测第一名,获得COCO数据集中目标检测第一名,图像分割第一名。3.2、手搓思路resnet50讲解,网络的输入照片大小是224x224的经过conv1,conv2,conv3,conv4,conv5最后在平均池化,全连接层。由于中间有重复利用的模块,所以我们需要将它们写成一个类,用来重复调用即可3.3、resetnet核心要点:1、提出residual模块(残差)2、使用Batch Normalization加速训练(均值为0,方差为1)3.4 模型结构解析(restnet50)1、conv1,stride=2,kernel_size=7,out_chnnels=642、conv2_x2.1、 max_pool:kernel_size=3, stride=22.2、 conv_01:stride=1,kernel_size=1,out_chnnels=642.3、 conv_02:stride=2,kernel_size=3,out_chnnels=642.4、 conv_03:stride=1,kernel_size=1,out_chnnels=2562.5、 layers(conv_01+conv_02+conv_03)*33、conv3_x3.1、conv_01:stride=1,kernel_size=1,out_chnnels=1283.2、conv_02:stride=2,kernel_size=3,out_chnnels=1283.3、conv_03:stride=1,kernel_size=1,out_chnnels=5123.4、residual:stride=2,kernel_size=1,out_chnnels=5123.5、layers(conv_01+conv_02+conv_03)*44、conv4_x4.1、conv_01:stride=1,kernel_size=1,out_chnnels=2564.2、conv_02:stride=2,kernel_size=3,out_chnnels=2564.3、conv_03:stride=1,kernel_size=1,out_chnnels=10244.4、residual:stride=2,kernel_size=1,out_chnnels=10244.5、layers(conv_01+conv_02+conv_03)*65、conv5_x5.1、conv_01:stride=1,kernel_size=1,out_chnnels=5125.2、conv_02:stride=2,kernel_size=3,out_chnnels=5125.3、conv_03:stride=1,kernel_size=1,out_chnnels=20485.4、residual:stride=2,kernel_size=1,out_chnnels=20485.5、layers(conv_01+conv_02+conv_03)*36、fc6.1、AdaptiveAvgPool2d:output=(1,1)6.2、flatten:(x, 1)6.3、fc:linear(512 * 4,num_class)import torch.nn as nn import torch class Block(nn.Module): def __init__(self, in_channels, out_channels, stride=1, downsample=False): super(Block, self).__init__() out_channel_01, out_channel_02, out_channel_03 = out_channels self.downsample = downsample self.relu = nn.ReLU(inplace=True) self.conv1 = nn.Sequential( nn.Conv2d(in_channels, out_channel_01, kernel_size=1, stride=1,bias=False), nn.BatchNorm2d(out_channel_01), nn.ReLU(inplace=True) ) self.conv2 = nn.Sequential( nn.Conv2d(out_channel_01, out_channel_02, kernel_size=3, stride=stride, padding=1, bias=False), nn.BatchNorm2d(out_channel_02), nn.ReLU(inplace=True) ) self.conv3 = nn.Sequential( nn.Conv2d(out_channel_02, out_channel_03, kernel_size=1, stride=1, bias=False), nn.BatchNorm2d(out_channel_03), ) if downsample: self.shortcut = nn.Sequential( nn.Conv2d(in_channels, out_channel_03, kernel_size=1, stride=stride, bias=False), nn.BatchNorm2d(out_channel_03) ) def forward(self,x): x_shortcut = x x = self.conv1(x) x = self.conv2(x) x = self.conv3(x) if self.downsample: x_shortcut = self.shortcut(x_shortcut) x = x + x_shortcut x = self.relu(x) return xclass Resnet50(nn.Module): def __init__(self): super(Resnet50,self).__init__() self.conv1 = nn.Sequential( nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3), nn.BatchNorm2d(64), nn.ReLU(), ) Layers = [3, 4, 6, 3] self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) self.conv2 = self._make_layer(64, (64, 64, 256), Layers[0],1) self.conv3 = self._make_layer(256, (128, 128, 512), Layers[1], 2) self.conv4 = self._make_layer(512, (256, 256, 1024), Layers[2], 2) self.conv5 = self._make_layer(1024, (512, 512, 2048), Layers[3], 2) self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) self.fc = nn.Sequential( nn.Linear(2048, 1000) ) def forward(self, input): x = self.conv1(input) x = self.maxpool(x) x = self.conv2(x) x = self.conv3(x) x = self.conv4(x) x = self.conv5(x) x = self.avgpool(x) x = torch.flatten(x, 1) x = self.fc(x) return x def _make_layer(self, in_channels, out_channels, blocks, stride=1): layers = [] block_1 = Block(in_channels, out_channels, stride=stride, downsample=True) layers.append(block_1) for i in range(1, blocks): layers.append(Block(out_channels[2], out_channels, stride=1, downsample=False)) return nn.Sequential(*layers)#打印网络结构 net = Resnet50() x = torch.rand((10, 3, 224, 224)) for name,layer in net.named_children(): if name != "fc": x = layer(x) print(name, 'output shaoe:', x.shape) else: x = x.view(x.size(0), -1) x = layer(x) print(name, 'output shaoe:', x.shape)conv1 output shaoe: torch.Size([10, 64, 112, 112]) maxpool output shaoe: torch.Size([10, 64, 56, 56]) conv2 output shaoe: torch.Size([10, 256, 56, 56]) conv3 output shaoe: torch.Size([10, 512, 28, 28]) conv4 output shaoe: torch.Size([10, 1024, 14, 14]) conv5 output shaoe: torch.Size([10, 2048, 7, 7]) avgpool output shaoe: torch.Size([10, 2048, 1, 1]) fc output shaoe: torch.Size([10, 1000])#torchinfo 可视化网络结构 from torchinfo import summary net = Resnet50() summary(net,((10, 3, 224, 224)))========================================================================================== Layer (type:depth-idx) Output Shape Param # ========================================================================================== Resnet50 [10, 1000] -- ├─Sequential: 1-1 [10, 64, 112, 112] -- │ └─Conv2d: 2-1 [10, 64, 112, 112] 9,472 │ └─BatchNorm2d: 2-2 [10, 64, 112, 112] 128 │ └─ReLU: 2-3 [10, 64, 112, 112] -- ├─MaxPool2d: 1-2 [10, 64, 56, 56] -- ├─Sequential: 1-3 [10, 256, 56, 56] -- │ └─Block: 2-4 [10, 256, 56, 56] -- │ │ └─Sequential: 3-1 [10, 64, 56, 56] 4,224 │ │ └─Sequential: 3-2 [10, 64, 56, 56] 36,992 │ │ └─Sequential: 3-3 [10, 256, 56, 56] 16,896 │ │ └─Sequential: 3-4 [10, 256, 56, 56] 16,896 │ │ └─ReLU: 3-5 [10, 256, 56, 56] -- │ └─Block: 2-5 [10, 256, 56, 56] -- │ │ └─Sequential: 3-6 [10, 64, 56, 56] 16,512 │ │ └─Sequential: 3-7 [10, 64, 56, 56] 36,992 │ │ └─Sequential: 3-8 [10, 256, 56, 56] 16,896 │ │ └─ReLU: 3-9 [10, 256, 56, 56] -- │ └─Block: 2-6 [10, 256, 56, 56] -- │ │ └─Sequential: 3-10 [10, 64, 56, 56] 16,512 │ │ └─Sequential: 3-11 [10, 64, 56, 56] 36,992 │ │ └─Sequential: 3-12 [10, 256, 56, 56] 16,896 │ │ └─ReLU: 3-13 [10, 256, 56, 56] -- ├─Sequential: 1-4 [10, 512, 28, 28] -- │ └─Block: 2-7 [10, 512, 28, 28] -- │ │ └─Sequential: 3-14 [10, 128, 56, 56] 33,024 │ │ └─Sequential: 3-15 [10, 128, 28, 28] 147,712 │ │ └─Sequential: 3-16 [10, 512, 28, 28] 66,560 │ │ └─Sequential: 3-17 [10, 512, 28, 28] 132,096 │ │ └─ReLU: 3-18 [10, 512, 28, 28] -- │ └─Block: 2-8 [10, 512, 28, 28] -- │ │ └─Sequential: 3-19 [10, 128, 28, 28] 65,792 │ │ └─Sequential: 3-20 [10, 128, 28, 28] 147,712 │ │ └─Sequential: 3-21 [10, 512, 28, 28] 66,560 │ │ └─ReLU: 3-22 [10, 512, 28, 28] -- │ └─Block: 2-9 [10, 512, 28, 28] -- │ │ └─Sequential: 3-23 [10, 128, 28, 28] 65,792 │ │ └─Sequential: 3-24 [10, 128, 28, 28] 147,712 │ │ └─Sequential: 3-25 [10, 512, 28, 28] 66,560 │ │ └─ReLU: 3-26 [10, 512, 28, 28] -- │ └─Block: 2-10 [10, 512, 28, 28] -- │ │ └─Sequential: 3-27 [10, 128, 28, 28] 65,792 │ │ └─Sequential: 3-28 [10, 128, 28, 28] 147,712 │ │ └─Sequential: 3-29 [10, 512, 28, 28] 66,560 │ │ └─ReLU: 3-30 [10, 512, 28, 28] -- ├─Sequential: 1-5 [10, 1024, 14, 14] -- │ └─Block: 2-11 [10, 1024, 14, 14] -- │ │ └─Sequential: 3-31 [10, 256, 28, 28] 131,584 │ │ └─Sequential: 3-32 [10, 256, 14, 14] 590,336 │ │ └─Sequential: 3-33 [10, 1024, 14, 14] 264,192 │ │ └─Sequential: 3-34 [10, 1024, 14, 14] 526,336 │ │ └─ReLU: 3-35 [10, 1024, 14, 14] -- │ └─Block: 2-12 [10, 1024, 14, 14] -- │ │ └─Sequential: 3-36 [10, 256, 14, 14] 262,656 │ │ └─Sequential: 3-37 [10, 256, 14, 14] 590,336 │ │ └─Sequential: 3-38 [10, 1024, 14, 14] 264,192 │ │ └─ReLU: 3-39 [10, 1024, 14, 14] -- │ └─Block: 2-13 [10, 1024, 14, 14] -- │ │ └─Sequential: 3-40 [10, 256, 14, 14] 262,656 │ │ └─Sequential: 3-41 [10, 256, 14, 14] 590,336 │ │ └─Sequential: 3-42 [10, 1024, 14, 14] 264,192 │ │ └─ReLU: 3-43 [10, 1024, 14, 14] -- │ └─Block: 2-14 [10, 1024, 14, 14] -- │ │ └─Sequential: 3-44 [10, 256, 14, 14] 262,656 │ │ └─Sequential: 3-45 [10, 256, 14, 14] 590,336 │ │ └─Sequential: 3-46 [10, 1024, 14, 14] 264,192 │ │ └─ReLU: 3-47 [10, 1024, 14, 14] -- │ └─Block: 2-15 [10, 1024, 14, 14] -- │ │ └─Sequential: 3-48 [10, 256, 14, 14] 262,656 │ │ └─Sequential: 3-49 [10, 256, 14, 14] 590,336 │ │ └─Sequential: 3-50 [10, 1024, 14, 14] 264,192 │ │ └─ReLU: 3-51 [10, 1024, 14, 14] -- │ └─Block: 2-16 [10, 1024, 14, 14] -- │ │ └─Sequential: 3-52 [10, 256, 14, 14] 262,656 │ │ └─Sequential: 3-53 [10, 256, 14, 14] 590,336 │ │ └─Sequential: 3-54 [10, 1024, 14, 14] 264,192 │ │ └─ReLU: 3-55 [10, 1024, 14, 14] -- ├─Sequential: 1-6 [10, 2048, 7, 7] -- │ └─Block: 2-17 [10, 2048, 7, 7] -- │ │ └─Sequential: 3-56 [10, 512, 14, 14] 525,312 │ │ └─Sequential: 3-57 [10, 512, 7, 7] 2,360,320 │ │ └─Sequential: 3-58 [10, 2048, 7, 7] 1,052,672 │ │ └─Sequential: 3-59 [10, 2048, 7, 7] 2,101,248 │ │ └─ReLU: 3-60 [10, 2048, 7, 7] -- │ └─Block: 2-18 [10, 2048, 7, 7] -- │ │ └─Sequential: 3-61 [10, 512, 7, 7] 1,049,600 │ │ └─Sequential: 3-62 [10, 512, 7, 7] 2,360,320 │ │ └─Sequential: 3-63 [10, 2048, 7, 7] 1,052,672 │ │ └─ReLU: 3-64 [10, 2048, 7, 7] -- │ └─Block: 2-19 [10, 2048, 7, 7] -- │ │ └─Sequential: 3-65 [10, 512, 7, 7] 1,049,600 │ │ └─Sequential: 3-66 [10, 512, 7, 7] 2,360,320 │ │ └─Sequential: 3-67 [10, 2048, 7, 7] 1,052,672 │ │ └─ReLU: 3-68 [10, 2048, 7, 7] -- ├─AdaptiveAvgPool2d: 1-7 [10, 2048, 1, 1] -- ├─Sequential: 1-8 [10, 1000] -- │ └─Linear: 2-20 [10, 1000] 2,049,000 ========================================================================================== Total params: 25,557,096 Trainable params: 25,557,096 Non-trainable params: 0 Total mult-adds (G): 40.90 ========================================================================================== Input size (MB): 6.02 Forward/backward pass size (MB): 1778.32 Params size (MB): 102.23 Estimated Total Size (MB): 1886.57 ==========================================================================================from torch.utils.data import Dataset, DataLoader from torchvision.transforms import transforms import torchvision import os import numpy as np import torch#超参数定义 # 批次的大小 batch_size = 16 #可选32、64、128 # 优化器的学习率 lr = 1e-4 #运行epoch max_epochs = 2 # 方案一:指定GPU的方式 # os.environ['CUDA_VISIBLE_DEVICES'] = '0,1' # 指明调用的GPU为0,1号 # 方案二:使用“device”,后续对要使用GPU的变量用.to(device)即可 # device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") # 指明调用的GPU为1号 # 数据读取 #cifar10数据集为例给出构建Dataset类的方式 from torchvision import datasets #“data_transform”可以对图像进行一定的变换,如翻转、裁剪、归一化等操作,可自己定义 data_transform=transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5)) ]) train_cifar_dataset = datasets.CIFAR10('cifar10',train=True, download=False,transform=data_transform) test_cifar_dataset = datasets.CIFAR10('cifar10',train=False, download=False,transform=data_transform) #构建好Dataset后,就可以使用DataLoader来按批次读入数据了 train_loader = torch.utils.data.DataLoader(train_cifar_dataset, batch_size=batch_size, num_workers=4, shuffle=True, drop_last=True) test_loader = torch.utils.data.DataLoader(test_cifar_dataset, batch_size=batch_size, num_workers=4, shuffle=False)# from tensorboard import SummaryWriter from torch.utils.tensorboard import SummaryWriter writer = SummaryWriter('./runs')#训练&验证 writer = SummaryWriter('./runs') # Set fixed random number seed torch.manual_seed(42) # 定义损失函数和优化器 device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') My_model = Resnet50() My_model = My_model.to(device) # 交叉熵 criterion = torch.nn.CrossEntropyLoss() # 优化器 optimizer = torch.optim.Adam(My_model.parameters(), lr=lr) epoch = max_epochs total_step = len(train_loader) train_all_loss = [] test_all_loss = [] for i in range(epoch): My_model.train() train_total_loss = 0 train_total_num = 0 train_total_correct = 0 for iter, (images,labels) in enumerate(train_loader): images = images.to(device) labels = labels.to(device) # Write the network graph at epoch 0, batch 0 if epoch == 0 and iter == 0: writer.add_graph(My_model, input_to_model=(images,labels)[0], verbose=True) # Write an image at every batch 0 if iter == 0: writer.add_image("Example input", images[0], global_step=epoch) outputs = My_model(images) loss = criterion(outputs,labels) train_total_correct += (outputs.argmax(1) == labels).sum().item() #backword optimizer.zero_grad() loss.backward() optimizer.step() train_total_num += labels.shape[0] train_total_loss += loss.item() # Print statistics writer.add_scalar("Loss/Minibatches", train_total_loss, train_total_num) print("Epoch [{}/{}], Iter [{}/{}], train_loss:{:4f}".format(i+1,epoch,iter+1,total_step,loss.item()/labels.shape[0])) # Write loss for epoch writer.add_scalar("Loss/Epochs", train_total_loss, epoch) My_model.eval() test_total_loss = 0 test_total_correct = 0 test_total_num = 0 for iter,(images,labels) in enumerate(test_loader): images = images.to(device) labels = labels.to(device) outputs = My_model(images) loss = criterion(outputs,labels) test_total_correct += (outputs.argmax(1) == labels).sum().item() test_total_loss += loss.item() test_total_num += labels.shape[0] print("Epoch [{}/{}], train_loss:{:.4f}, train_acc:{:.4f}%, test_loss:{:.4f}, test_acc:{:.4f}%".format( i+1, epoch, train_total_loss / train_total_num, train_total_correct / train_total_num * 100, test_total_loss / test_total_num, test_total_correct / test_total_num * 100 )) train_all_loss.append(np.round(train_total_loss / train_total_num,4)) test_all_loss.append(np.round(test_total_loss / test_total_num,4))#方法三:tensorboard查看 writer.add_graph(net,torch.rand(10, 3, 224, 224)) writer.close()4、知识补充input:shape(10, 3, 224, 224)10:batch_size, 3:RGD ,224: width, 224:highconv:(in_depth=3, out_depth=64, kernel_size=7, stride=2, padding=3)output:width = (224-7+2*3)/2 + 1 = 112high = (224-7+2*3)/2 +1 = 112out_depth:64batch_dize:10shape:(10, 64, 112, 112)

0
0
0
浏览量2036
清晨我上码

PyTorch10天入门-day03-数据读取

PyTorch 数据读取GPU配置数据预处理划分训练集、验证集、测试集选择模型设定损失函数&优化方法模型效果评估本节主要讲前3部分#导入常用包 import os import numpy as np import torch from torch.utils.data import Dataset, DataLoader from torchvision.transforms import transforms超参数可以统一设置,参数初始化:batch size初始学习率(初始)训练次数(max_epochs)GPU配置#超参数定义 # 批次的大小 batch_size = 16 #可选32、64、128 # 优化器的学习率 lr = 1e-4 #运行epoch max_epochs = 10 # 方案一:指定GPU的方式 os.environ['CUDA_VISIBLE_DEVICES'] = '0,1' # 指明调用的GPU为0,1号 # 方案二:使用“device”,后续对要使用GPU的变量用.to(device)即可 device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu") # 指明调用的GPU为1号Dataset类主要包含三个函数:init: 用于向类中传入外部参数,同时定义样本集getitem: 用于逐个读取样本集合中的元素,可以进行一定的变换,并将返回训练/验证所需的数据len: 用于返回数据集的样本数# 数据读取 #cifar10数据集为例给出构建Dataset类的方式 from torchvision import datasets #“data_transform”可以对图像进行一定的变换,如翻转、裁剪、归一化等操作,可自己定义 data_transform=transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5)) ]) train_cifar_dataset = datasets.CIFAR10('cifar10',train=True, download=False,transform=data_transform) test_cifar_dataset = datasets.CIFAR10('cifar10',train=False, download=False,transform=data_transform)#查看dataset print(test_cifar_dataset.__len__) image_demo = test_cifar_dataset.__getitem__(1)[0] print(image_demo) print(image_demo.size())<bound method CIFAR10.__len__ of Dataset CIFAR10 Number of datapoints: 10000 Root location: cifar10 Split: Test StandardTransform Transform: Compose( ToTensor() Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5)) )> tensor([[[ 0.8431, 0.8118, 0.8196, ..., 0.8275, 0.8275, 0.8196], [ 0.8667, 0.8431, 0.8431, ..., 0.8510, 0.8510, 0.8431], [ 0.8588, 0.8353, 0.8353, ..., 0.8431, 0.8431, 0.8353],#查看数据集 import matplotlib.pyplot as plt classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck') dataiter = iter(test_cifar_dataset) plt.show() for i in range(10): images, labels = dataiter.__next__() print(images.size()) print(str(classes[labels])) # images = images.numpy().transpose(1, 2, 0) # 把channel那一维放到最后 # plt.title(str(classes[labels])) # plt.imshow(images)torch.Size([3, 32, 32]) cat torch.Size([3, 32, 32]) ship torch.Size([3, 32, 32]) ship torch.Size([3, 32, 32]) plane torch.Size([3, 32, 32]) frog torch.Size([3, 32, 32]) frog torch.Size([3, 32, 32]) car torch.Size([3, 32, 32]) frog#构建好Dataset后,就可以使用DataLoader来按批次读入数据了 train_loader = torch.utils.data.DataLoader(train_cifar_dataset, batch_size=batch_size, num_workers=4, shuffle=True, drop_last=True) val_loader = torch.utils.data.DataLoader(test_cifar_dataset, batch_size=batch_size, num_workers=4, shuffle=False)参数说明:batch_size:样本是按“批”读入的,batch_size就是每次读入的样本数num_workers:有多少个进程用于读取数据,Windows下该参数设置为0,Linux下常见的为4或者8,根据自己的电脑配置来设置shuffle:是否将读入的数据打乱,一般在训练集中设置为True,验证集中设置为Falsedrop_last:对于样本最后一部分没有达到批次数的样本,使其不再参与训练#自定义 Dataset 类 class MyDataset(Dataset): def __init__(self, data_dir, info_csv, image_list, transform=None): """ Args: data_dir: path to image directory. info_csv: path to the csv file containing image indexes with corresponding labels. image_list: path to the txt file contains image names to training/validation set transform: optional transform to be applied on a sample. """ label_info = pd.read_csv(info_csv) image_file = open(image_list).readlines() self.data_dir = data_dir self.image_file = image_file self.label_info = label_info self.transform = transform def __getitem__(self, index): """ Args: index: the index of item Returns: image and its labels """ image_name = self.image_file[index].strip('\n') raw_label = self.label_info.loc[self.label_info['Image_index'] == image_name] label = raw_label.iloc[:,0] image_name = os.path.join(self.data_dir, image_name) image = Image.open(image_name).convert('RGB') if self.transform is not None: image = self.transform(image) return image, label def __len__(self): return len(self.image_file)#自定义 dataset demo data_dir = '' info_csv = '' image_list = '' my_dataset = MyDataset(data_dir,info_csv,image_list)

0
0
0
浏览量2044
清晨我上码

Pytorch10天入门-day09-模型微调

模型微调(fine-tune)-迁移学习torchvision微调timm微调半精度训练起源:1、随着深度学习的发展,模型的参数越来越大,许多开源模型都是在较大数据集上进行训练的,比如Imagenet-1k,Imagenet-11k等2、如果数据集可能只有几千张,训练几千万参数的大模型,过拟合无法避免3、如果我们想从零开始训练一个大模型,那么我们的解决办法是收集更多的数据。然而,收集和标注数据会花费大量的时间和资⾦,成本无法承受解决方案:应用迁移学习(transfer learning),将从源数据集学到的知识迁移到目标数据集上比如:ImageNet数据集的图像大多跟椅子无关,但在该数据集上训练的模型可以抽取较通用的图像特征,从而能够帮助识别边缘、纹理、形状和物体组成模型微调(finetune):就是先找到一个同类的别人训练好的模型,基于已经训练好的模型换成自己的数据,通过训练调整一下参数不同数据集下使用微调:数据集1 - 数据量少,但数据相似度非常高 - 在这种情况下,我们所做的只是修改最后几层或最终的softmax图层的输出类别。数据集2 - 数据量少,数据相似度低 - 在这种情况下,我们可以冻结预训练模型的初始层(比如k层),并再次训练剩余的(n-k)层。由于新数据集的相似度较低,因此根据新数据集对较高层进行重新训练具有重要意义。数据集3 - 数据量大,数据相似度低 - 在这种情况下,由于我们有一个大的数据集,我们的神经网络训练将会很有效。但是,由于我们的数据与用于训练我们的预训练模型的数据相比有很大不同。使用预训练模型进行的预测不会有效。因此,最好根据你的数据从头开始训练神经网络(Training from scatch)数据集4 - 数据量大,数据相似度高 - 这是理想情况。在这种情况下,预训练模型应该是最有效的。使用模型的最好方法是保留模型的体系结构和模型的初始权重。然后,我们可以使用在预先训练的模型中的权重来重新训练该模型。微调的是什么?换数据源针对K层进行重新训练K层的权重&shape调整1、模型微调(fine-tune)一般流程:1、在源数据集(如ImageNet数据集)上预训练一个神经网络模型,即源模型2、创建一个新的神经网络模型,即目标模型,它复制了源模型上除了输出层外的所有模型设计及其参数3、为目标模型添加一个输出⼤小为⽬标数据集类别个数的输出层,并随机初始化该层的模型参数4、在目标数据集上训练目标模型。我们将从头训练输出层,而其余层的参数都是基于源模型的参数微调得到的2、torchvision微调2.1 实例化Modelimport torchvision.modelsas models resnet34= models.resnet34(pretrained=True)pretrained参数说明:1、通过True或者False来决定是否使用预训练好的权重,在默认状态下pretrained = False,意味着我们不使用预训练得到的权重2、当pretrained = True,意味着我们将使用在一些数据集上预训练得到的权重注意:如果中途强行停止下载的话,一定要去对应路径下将权重文件删除干净,否则会报错。2.2 训练特定层如果我们正在提取特征并且只想为新初始化的层计算梯度,其他参数不进行改变。那我们就需要通过设置requires_grad = False来冻结部分层def set_parameter_requires_grad(model, feature_extracting): if feature_extracting: for paramin model.parameters(): param.requires_grad=False2.3 实例使用resnet34为例的将1000类改为10类,但是仅改变最后一层的模型参数我们先冻结模型参数的梯度,再对模型输出部分的全连接层进行修改import torch import torch.nn.functional as F import torch.nn as nn from torch.optim.lr_scheduler import LambdaLR from torch.optim.lr_scheduler import StepLR import torchvision from torch.utils.data import Dataset, DataLoader from torchvision.transforms import transforms from torch.utils.tensorboard import SummaryWriter import numpy as np import torchvision.models as models from torchinfo import summary#超参数定义 # 批次的大小 batch_size = 16 #可选32、64、128 # 优化器的学习率 lr = 1e-4 #运行epoch max_epochs = 2 # 方案二:使用“device”,后续对要使用GPU的变量用.to(device)即可 device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")#超参数定义 # 批次的大小 batch_size = 16 #可选32、64、128 # 优化器的学习率 lr = 1e-4 #运行epoch max_epochs = 2 # 方案二:使用“device”,后续对要使用GPU的变量用.to(device)即可 device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")# 数据读取 #cifar10数据集为例给出构建Dataset类的方式 from torchvision import datasets #“data_transform”可以对图像进行一定的变换,如翻转、裁剪、归一化等操作,可自己定义 data_transform=transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5)) ]) train_cifar_dataset = datasets.CIFAR10('cifar10',train=True, download=False,transform=data_transform) test_cifar_dataset = datasets.CIFAR10('cifar10',train=False, download=False,transform=data_transform) #构建好Dataset后,就可以使用DataLoader来按批次读入数据了 train_loader = torch.utils.data.DataLoader(train_cifar_dataset, batch_size=batch_size, num_workers=4, shuffle=True, drop_last=True) test_loader = torch.utils.data.DataLoader(test_cifar_dataset, batch_size=batch_size, num_workers=4, shuffle=False)# 下载预训练模型 restnet34 resnet34 = models.resnet34(pretrained=True) print(resnet34)#查看模型结构 summary(resnet34, (1, 3, 224, 224))#检测 模型准确率 def cal_predict_correct(model): test_total_correct = 0 for iter,(images,labels) in enumerate(test_loader): images = images.to(device) labels = labels.to(device) outputs = model(images) test_total_correct += (outputs.argmax(1) == labels).sum().item() # print("test_total_correct: "+ str(test_total_correct)) return test_total_correcttotal_correct = cal_predict_correct(resnet34) print("test_total_correct: "+ str(test_total_correct / 10000))test_total_correct: 0.1#微调模型 resnet34 def set_parameter_requires_grad(model, feature_extracting): if feature_extracting: for param in model.parameters(): param.requires_grad = False # 冻结参数的梯度 feature_extract = True new_model = resnet34 set_parameter_requires_grad(new_model, feature_extract) # 修改模型 #训练过程中,model仍会进行梯度回传,但是参数更新则只会发生在fc层 num_ftrs = new_model.fc.in_features new_model.fc = nn.Linear(in_features=num_ftrs, out_features=10, bias=True) summary(new_model, (1, 3, 224, 224))#训练&验证 Resnet34_new = new_model.to(device) # 定义损失函数和优化器 device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') # 损失函数:自定义损失函数 criterion = nn.CrossEntropyLoss() # 优化器 optimizer = torch.optim.Adam(Resnet50_new.parameters(), lr=lr) epoch = max_epochs total_step = len(train_loader) train_all_loss = [] test_all_loss = [] for i in range(epoch): Resnet34_new.train() train_total_loss = 0 train_total_num = 0 train_total_correct = 0 for iter, (images,labels) in enumerate(train_loader): images = images.to(device) labels = labels.to(device) outputs = Resnet34_new(images) loss = criterion(outputs,labels) train_total_correct += (outputs.argmax(1) == labels).sum().item() #backword optimizer.zero_grad() loss.backward() optimizer.step() train_total_num += labels.shape[0] train_total_loss += loss.item() print("Epoch [{}/{}], Iter [{}/{}], train_loss:{:4f}".format(i+1,epoch,iter+1,total_step,loss.item()/labels.shape[0])) Resnet34_new.eval() test_total_loss = 0 test_total_correct = 0 test_total_num = 0 for iter,(images,labels) in enumerate(test_loader): images = images.to(device) labels = labels.to(device) outputs = Resnet34_new(images) loss = criterion(outputs,labels) test_total_correct += (outputs.argmax(1) == labels).sum().item() test_total_loss += loss.item() test_total_num += labels.shape[0] print("Epoch [{}/{}], train_loss:{:.4f}, train_acc:{:.4f}%, test_loss:{:.4f}, test_acc:{:.4f}%".format( i+1, epoch, train_total_loss / train_total_num, train_total_correct / train_total_num * 100, test_total_loss / test_total_num, test_total_correct / test_total_num * 100 )) train_all_loss.append(np.round(train_total_loss / train_total_num,4)) test_all_loss.append(np.round(test_total_loss / test_total_num,4))total_correct = cal_predict_correct(Resnet34_new) print("test_total_correct: "+ str(test_total_correct / 10000))test_total_correct: 0.13、timm微调安装:pip3 install timm3.1、查看预训练模型¶import timm from torchinfo import summary avail_pretrained_models = timm.list_models(pretrained=True) len(avail_pretrained_models)# 模型列表 avail_pretrained_models3.1.1、查看模型参数model = timm.create_model('resnet18',pretrained=True) model.default_cfg{'url': 'https://download.pytorch.org/models/resnet18-5c106cde.pth', 'num_classes': 1000, 'input_size': (3, 224, 224), 'pool_size': (7, 7), 'crop_pct': 0.875, 'interpolation': 'bilinear', 'mean': (0.485, 0.456, 0.406), 'std': (0.229, 0.224, 0.225), 'first_conv': 'conv1', 'classifier': 'fc', 'architecture': 'resnet18'}3.2、修改预训练模型# 测试模型 x = torch.randn(1,3,224,224) output = model(x) output.shapetorch.Size([1, 1000])3.2.1、查看某一层模型参数(以第一层卷积为例)list(dict(model.named_children())['conv1'].parameters())[Parameter containing: tensor([[[[-1.0419e-02, -6.1356e-03, -1.8098e-03, ..., 5.6615e-02, 1.7083e-02, -1.2694e-02], [ 1.1083e-02, 9.5276e-03, -1.0993e-01, ..., -2.7124e-01, -1.2907e-01, 3.7424e-03], [-6.9434e-03, 5.9089e-02, 2.9548e-01, ..., 5.1972e-01, 2.5632e-01, 6.3573e-02], ..., [-2.7535e-02, 1.6045e-02, 7.2595e-02, ..., -3.3285e-01, -4.2058e-01, -2.5781e-01], [ 3.0613e-02, 4.0960e-02, 6.2850e-02, ..., 4.1384e-01, 3.9359e-01, 1.6606e-01], [-1.3736e-02, -3.6746e-03, -2.4084e-02, ..., -1.5070e-01, -8.2230e-02, -5.7828e-03]], [[-1.1397e-02, -2.6619e-02, -3.4641e-02, ..., 3.2521e-02, 6.6221e-04, -2.5743e-02], [ 4.5687e-02, 3.3603e-02, -1.0453e-01, ..., -3.1253e-01, -1.6051e-01, -1.2826e-03], [-8.3730e-04, 9.8420e-02, 4.0210e-01, ..., 7.0789e-01, 3.6887e-01, 1.2455e-01],#微调 #修改模型(将1000类改为10类输出) #改变输入通道数(比如我们传入的图片是单通道的,但是模型需要的是三通道图片) #我们可以通过添加in_chans=1来改变 model = timm.create_model('resnet18',num_classes=10,pretrained=True,in_chans=1) x = torch.randn(1,1,224,224) output = model(x) output.shape#模型保存 torch.save(model.state_dict(),'./checkpoint/timm_model.pth') model.load_state_dict(torch.load('./checkpoint/timm_model.pth'))4、半精度训练问题:GPU的性能主要分为两部分:算力和显存。前者决定了显卡计算的速度,后者则决定了显卡可以同时放入多少数据用于计算在可以使用的显存数量一定的情况下,每次训练能够加载的数据更多(也就是batch size更大),则也可以提高训练效率定义:PyTorch默认的浮点数存储方式用的是torch.float32,小数点后位数更多固然能保证数据的精确性但绝大多数场景其实并不需要这么精确,只保留一半的信息也不会影响结果,也就是使用torch.float16格式。由于数位减了一半,因此被称为“半精度”显然半精度能够减少显存占用,使得显卡可以同时加载更多数据进行计算4.1、半精度训练的设置1、引入 from torch.cuda.amp import autocast2、forward函数指定 autocast 装饰器3、训练过程: 只需在将数据输入模型及其之后的部分放入“with autocast():“4、半精度训练主要适用于数据本身的size比较大(比如说3D图像、视频等)4.2、引入from torch.cuda.amp import autocast# forward指定装饰器@autocast() def forward(self, x): ...return x# 训练过程中:指定with autocastfor xin train_loader: x= x.cuda() with autocast(): output= model(x) ...4.3、半精度训练案例from torch.cuda.amp import autocast # 半精度模型 class DemoModel(nn.Module): def __init__(self): super(DemoModel, self).__init__() self.conv1 = nn.Conv2d(3, 6, 5) self.pool = nn.MaxPool2d(2, 2) self.conv2 = nn.Conv2d(6, 16, 5) self.fc1 = nn.Linear(16 * 5 * 5, 120) self.fc2 = nn.Linear(120, 84) self.fc3 = nn.Linear(84, 10) @autocast() def forward(self, x): x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) x = x.view(-1, 16 * 5 * 5) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x#训练&验证 device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') half_model = DemoModel().to(device) # 损失函数:自定义损失函数 criterion = nn.CrossEntropyLoss() # 优化器 optimizer = torch.optim.Adam(Resnet50_new.parameters(), lr=lr) epoch = max_epochs total_step = len(train_loader) train_all_loss = [] test_all_loss = [] for i in range(epoch): half_model.train() train_total_loss = 0 train_total_num = 0 train_total_correct = 0 for iter, (images,labels) in enumerate(train_loader): images = images.to(device) labels = labels.to(device) with autocast(): outputs = half_model(images) loss = criterion(outputs,labels) train_total_correct += (outputs.argmax(1) == labels).sum().item() #backword optimizer.zero_grad() loss.backward() optimizer.step() train_total_num += labels.shape[0] train_total_loss += loss.item() print("Epoch [{}/{}], Iter [{}/{}], train_loss:{:4f}".format(i+1,epoch,iter+1,total_step,loss.item()/labels.shape[0])) half_model.eval() test_total_loss = 0 test_total_correct = 0 test_total_num = 0 for iter,(images,labels) in enumerate(test_loader): images = images.to(device) labels = labels.to(device) with autocast(): outputs = half_model(images) loss = criterion(outputs,labels) test_total_correct += (outputs.argmax(1) == labels).sum().item() test_total_loss += loss.item() test_total_num += labels.shape[0] print("Epoch [{}/{}], train_loss:{:.4f}, train_acc:{:.4f}%, test_loss:{:.4f}, test_acc:{:.4f}%".format( i+1, epoch, train_total_loss / train_total_num, train_total_correct / train_total_num * 100, test_total_loss / test_total_num, test_total_correct / test_total_num * 100 )) train_all_loss.append(np.round(train_total_loss / train_total_num,4)) test_all_loss.append(np.round(test_total_loss / test_total_num,4))

0
0
0
浏览量2012