We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python tools/train_net.py --config-file ./configs/Visdrone/sbs_R50-ibn.yml MODEL.DEVICE "cuda:0"
之后并没有产生报错但也没有进行到iteration中进行训练。 2. 由于在windows系统中没有进行make all的那一步操作 3. 全部的log内容如下:
Command Line Args: Namespace(config_file='./configs/Visdrone/sbs_R50-ibn.yml', dist_url='tcp://127.0.0.1:49153', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=['MODEL.DEVICE', 'cuda:0'], resume=False) [04/06 13:08:42 fastreid]: Rank of current process: 0. World size: 1 [04/06 13:08:43 fastreid]: Environment info: ---------------------- ------------------------------------------------------------------------------------ sys.platform win32 Python 3.7.16 (default, Jan 17 2023, 16:06:28) [MSC v.1916 64 bit (AMD64)] numpy 1.21.6 fastreid 1.3 @.\fastreid FASTREID_ENV_MODULE <not set> PyTorch 1.13.1+cu117 @D:\anaconda\envs\BOTsort\lib\site-packages\torch PyTorch debug build False GPU available True GPU 0 NVIDIA GeForce RTX 3080 CUDA_HOME C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7 Pillow 9.5.0 torchvision 0.14.1+cu117 @D:\anaconda\envs\BOTsort\lib\site-packages\torchvision torchvision arch flags D:\anaconda\envs\BOTsort\lib\site-packages\torchvision\_C.pyd; cannot find cuobjdump cv2 4.9.0 ---------------------- ------------------------------------------------------------------------------------ PyTorch built with: - C++ Version: 199711 - MSVC 192829337 - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815) - OpenMP 2019 - LAPACK is enabled (usually provided by MKL) - CPU capability usage: AVX2 - CUDA Runtime 11.7 - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37 - CuDNN 8.5 - Magma 2.5.4 - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=C:/actions-runner/_work/pytorch/pytorch/builder/windows/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL -openmp:experimental -IC:/actions-runner/_work/pytorch/pytorch/builder/windows/mkl/include -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF, [04/06 13:08:43 fastreid]: Command line arguments: Namespace(config_file='./configs/Visdrone/sbs_R50-ibn.yml', dist_url='tcp://127.0.0.1:49153', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=['MODEL.DEVICE', 'cuda:0'], resume=False) [04/06 13:08:43 fastreid]: Contents of args.config_file=./configs/Visdrone/sbs_R50-ibn.yml: b'# _*_ coding:utf-8 _*_\r\n_BASE_: ../Base-SBS.yml\r\n\r\n# \xe8\xae\xbe\xe7\xbd\xae\xe7\x9b\xb8\xe5\xba\x94\xe7\x9a\x84\xe6\x95\xb0\xe6\x8d\xae\xe5\xa2\x9e\xe5\xbc\xba\r\nINPUT:\r\n SIZE_TRAIN: [256, 256]\r\n SIZE_TEST: [256, 256]\r\n\r\nMODEL:\r\n BACKBONE:\r\n WITH_IBN: True\r\n WITH_NL: True #\xe6\xa8\xa1\xe5\x9e\x8b\xe6\x98\xaf\xe5\x90\xa6\xe4\xbd\xbf\xe7\x94\xa8No_local module\r\n PRETRAIN: True\r\n PRETRAIN_PATH: \'pretrained\\veri_sbs_R50-ibn.pth\'\r\n HEADS:\r\n POOL_LAYER: GeneralizedMeanPooling # HEAD POOL_LAYERS\r\n LOSSES:\r\n NAME: ("CrossEntropyLoss", "TripletLoss",)\r\n CE:\r\n EPSILON: 0.1\r\n SCALE: 1.0\r\n\r\n TRI:\r\n MARGIN: 0.0 # \xe8\x80\x83\xe8\x99\x91\xe8\xa6\x81\xe4\xb8\x8d\xe8\xa6\x81\xe8\xbf\x9b\xe8\xa1\x8c\xe5\xaf\xb9\xe5\xba\x94\xe7\x9a\x84\xe8\xb6\x85\xe5\x8f\x82\xe6\x95\xb0\xe7\x9a\x84\xe8\xb0\x83\xe6\x95\xb4\r\n HARD_MINING: True\r\n NORM_FEAT: False\r\n SCALE: 1.0\r\nSOLVER:\r\n OPT: SGD\r\n BASE_LR: 0.0001# 0.01\r\n ETA_MIN_LR: 7.7e-5\r\n\r\n IMS_PER_BATCH: 128 # batchsize\r\n MAX_EPOCH: 10 # 60\r\n WARMUP_ITERS: 3000\r\n FREEZE_ITERS: 3000\r\n\r\n CHECKPOINT_PERIOD: 10\r\n\r\nDATASETS:\r\n NAMES: ("Visdrone",)\r\n TESTS: ("Visdrone",)\r\n\r\nDATALOADER:\r\n SAMPLER_TRAIN: BalancedIdentitySampler\r\n NUM_INSTANCE: 4\r\n NUM_WORKERS: 8\r\nTEST:\r\n EVAL_PERIOD: 10\r\n IMS_PER_BATCH: 256 # 256\r\n\r\nOUTPUT_DIR: logs/visdrone/sbs_R50-ibn' [04/06 13:08:43 fastreid]: Running with full config: CUDNN_BENCHMARK: False DATALOADER: NUM_INSTANCE: 4 NUM_WORKERS: 8 SAMPLER_TRAIN: BalancedIdentitySampler SET_WEIGHT: [] DATASETS: COMBINEALL: False NAMES: ('Visdrone',) TESTS: ('Visdrone',) INPUT: AFFINE: ENABLED: False AUGMIX: ENABLED: False PROB: 0.0 AUTOAUG: ENABLED: True PROB: 0.1 CJ: BRIGHTNESS: 0.15 CONTRAST: 0.15 ENABLED: False HUE: 0.1 PROB: 0.5 SATURATION: 0.1 CROP: ENABLED: False RATIO: [0.75, 1.3333333333333333] SCALE: [0.16, 1] SIZE: [224, 224] FLIP: ENABLED: True PROB: 0.5 PADDING: ENABLED: True MODE: constant SIZE: 10 REA: ENABLED: True PROB: 0.5 VALUE: [123.675, 116.28, 103.53] RPT: ENABLED: False PROB: 0.5 SIZE_TEST: [256, 256] SIZE_TRAIN: [256, 256] KD: EMA: ENABLED: False MOMENTUM: 0.999 MODEL_CONFIG: [] MODEL_WEIGHTS: [] MODEL: BACKBONE: ATT_DROP_RATE: 0.0 DEPTH: 50x DROP_PATH_RATIO: 0.1 DROP_RATIO: 0.0 FEAT_DIM: 2048 LAST_STRIDE: 1 NAME: build_resnet_backbone NORM: BN PRETRAIN: True PRETRAIN_PATH: pretrained\veri_sbs_R50-ibn.pth SIE_COE: 3.0 STRIDE_SIZE: (16, 16) WITH_IBN: True WITH_NL: True WITH_SE: False DEVICE: cuda:0 FREEZE_LAYERS: ['backbone'] HEADS: CLS_LAYER: CircleSoftmax EMBEDDING_DIM: 0 MARGIN: 0.35 NAME: EmbeddingHead NECK_FEAT: after NORM: BN NUM_CLASSES: 0 POOL_LAYER: GeneralizedMeanPooling SCALE: 64 WITH_BNNECK: True LOSSES: CE: ALPHA: 0.2 EPSILON: 0.1 SCALE: 1.0 CIRCLE: GAMMA: 128 MARGIN: 0.25 SCALE: 1.0 COSFACE: GAMMA: 128 MARGIN: 0.25 SCALE: 1.0 FL: ALPHA: 0.25 GAMMA: 2 SCALE: 1.0 NAME: ('CrossEntropyLoss', 'TripletLoss') TRI: HARD_MINING: True MARGIN: 0.0 NORM_FEAT: False SCALE: 1.0 META_ARCHITECTURE: Baseline PIXEL_MEAN: [123.675, 116.28, 103.53] PIXEL_STD: [58.395, 57.120000000000005, 57.375] QUEUE_SIZE: 8192 WEIGHTS: OUTPUT_DIR: logs/visdrone/sbs_R50-ibn SOLVER: AMP: ENABLED: True BASE_LR: 0.0001 BIAS_LR_FACTOR: 1.0 CHECKPOINT_PERIOD: 10 CLIP_GRADIENTS: CLIP_TYPE: norm CLIP_VALUE: 5.0 ENABLED: False NORM_TYPE: 2.0 DELAY_EPOCHS: 30 ETA_MIN_LR: 7.7e-05 FREEZE_ITERS: 3000 GAMMA: 0.1 HEADS_LR_FACTOR: 1.0 IMS_PER_BATCH: 128 MAX_EPOCH: 10 MOMENTUM: 0.9 NESTEROV: False OPT: SGD SCHED: CosineAnnealingLR STEPS: [40, 90] WARMUP_FACTOR: 0.1 WARMUP_ITERS: 3000 WARMUP_METHOD: linear WEIGHT_DECAY: 0.0005 WEIGHT_DECAY_BIAS: 0.0005 WEIGHT_DECAY_NORM: 0.0005 TEST: AQE: ALPHA: 3.0 ENABLED: False QE_K: 5 QE_TIME: 1 EVAL_PERIOD: 10 FLIP: ENABLED: False IMS_PER_BATCH: 256 METRIC: cosine PRECISE_BN: DATASET: Market1501 ENABLED: False NUM_ITER: 300 RERANK: ENABLED: False K1: 20 K2: 6 LAMBDA: 0.3 ROC: ENABLED: False [04/06 13:08:43 fastreid]: Full config saved to D:\zhuangshilin\BoT_SORT\fast_reid\logs\visdrone\sbs_R50-ibn\config.yaml D:\anaconda\envs\BOTsort\lib\site-packages\torchvision\transforms\transforms.py:330: UserWarning: Argument 'interpolation' of type int is deprecated since 0.13 and will be removed in 0.15. Please use InterpolationMode enum. "Argument 'interpolation' of type int is deprecated since 0.13 and will be removed in 0.15. "
之后程序就卡在这里不再更新log了,查看gpu也只有10%并没有跑起来,尝试在自己写的dataset.py里面print也是跟在后面显示出来后就没有再进一步,想知道怎么才能找到程序究竟卡在哪里
The text was updated successfully, but these errors were encountered:
设置断点调试后发现卡在了: fastreid.engine.train_loop 中的 class AMPTrainer中的 super().__init__(model, data_loader, optimizer, param_wrapper) 无法执行下去
super().__init__(model, data_loader, optimizer, param_wrapper)
Sorry, something went wrong.
修改IMS_PER_BATCH后可以了,但是多个iter之后loss还是=0
提问:数据集的id如果为1会有什么问题呢
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 14 days since being marked as stale.
No branches or pull requests
training problem:
之后并没有产生报错但也没有进行到iteration中进行训练。
2. 由于在windows系统中没有进行make all的那一步操作
3. 全部的log内容如下:
Expected behavior:
之后程序就卡在这里不再更新log了,查看gpu也只有10%并没有跑起来,尝试在自己写的dataset.py里面print也是跟在后面显示出来后就没有再进一步,想知道怎么才能找到程序究竟卡在哪里
The text was updated successfully, but these errors were encountered: