Low performance of Supervised training #563

sidgan · 2022-08-23T21:51:34Z

Expected behavior:

If there are no obvious error in "what you observed" provided above,
please tell us the expected behavior.

Problem statement:
We are using the pretrained imagenet model weights to perform supervised learning on our own dataset, consisting of ~60000 train images and ~14000 test images, there are a total of 1139 classes. I have changed the MLP head in the yaml file to reflect 1139 classes.
Expected: Stable training
What’s happening? Train accuracy increases too quickly reaching almost 90% in ~170 epochs but the test accuracy doesn’t improve at all, remai
log (3).txt
ning close to 0 for the most part. While performing supervised training in Pytorch we are able to get 70% accuracy.
Any insights on why this might be happening? Suggestions to effectively utilize the VISSL pipelines will be appreciated.

Command:

python3 tools/run_distributed_engines.py hydra.verbose=true config=benchmark/fulltune/imagenet1k/train.yaml config.DATA.TRAIN.DATASET_NAMES=[dummy_data_folder] config.DATA.TRAIN.DATA_SOURCES=[disk_folder] config.DATA.TRAIN.LABEL_SOURCES=[disk_folder] config.DATA.TRAIN.BATCHSIZE_PER_REPLICA=16 config.DATA.TRAIN.DATA_PATHS=["/home/images/train"] config.DATA.TEST.DATA_SOURCES=[disk_folder] config.DATA.TEST.LABEL_SOURCES=[disk_folder] config.DATA.TEST.DATASET_NAMES=[dummy_data_folder] config.DATA.TEST.BATCHSIZE_PER_REPLICA=16 config.DATA.TEST.DATA_PATHS=["/home/images/test"] config.OPTIMIZER.num_epochs=250 config.OPTIMIZER.param_schedulers.lr.values=[0.01,0.001] config.OPTIMIZER.param_schedulers.lr.milestones=[1] config.DISTRIBUTED.NUM_NODES=1 config.DISTRIBUTED.NUM_PROC_PER_NODE=1 config.HOOKS.TENSORBOARD_SETUP.USE_TENSORBOARD=true config.HOOKS.MEMORY_SUMMARY.PRINT_MEMORY_SUMMARY=false config.CHECKPOINT.DIR="/home/new_exp/checkpoint_supervised_2" config.MODEL.WEIGHTS_INIT.PARAMS_FILE="/home/resnet50-19c8e357.pth" config.MODEL.WEIGHTS_INIT.APPEND_PREFIX="trunk._feature_blocks." config.MODEL.WEIGHTS_INIT.STATE_DICT_KEY_NAME=""

Environment:

Provide your environment information using the following command:

wget -nc -q https://github.com/facebookresearch/vissl/raw/main/vissl/utils/collect_env.py && python collect_env.py

sys.platform linux
Python 3.6.9 (default, Jun 29 2022, 11:45:57) [GCC 8.4.0]
numpy 1.19.5
Pillow 8.4.0
vissl 0.1.6 @/home/vissl/vissl
GPU available True
GPU 0 Quadro GV100
CUDA_HOME /usr
torchvision 0.9.0+cu101 @/home/.local/lib/python3.6/site-packages/torchvision
hydra 1.0.7
@/home/.local/lib/python3.6/site-packages/hydra
classy_vision 0.7.0.dev @/home/.local/lib/python3.6/site-packages/classy_vision
tensorboard 2.9.1
apex 0.1
@/home/.local/lib/python3.6/site-packages/apex
cv2 4.6.0
PyTorch 1.8.0+cu101
@/home/.local/lib/python3.6/site-packages/torch
PyTorch debug build False

PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 10.1
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70
CuDNN 7.6.3
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.1, CUDNN_VERSION=7.6.3, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

CPU info:

Architecture x86_64
CPU op-mode(s) 32-bit, 64-bit
Byte Order Little Endian
CPU(s) 12
On-line CPU(s) list 0-11
Thread(s) per core 2
Core(s) per socket 6
Socket(s) 1
NUMA node(s) 1
Vendor ID GenuineIntel
CPU family 6
Model 85
Model name Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz
Stepping 4
CPU MHz 3999.959
CPU max MHz 4000.0000
CPU min MHz 1200.0000
BogoMIPS 6999.82
Virtualization VT-x
L1d cache 32K
L1i cache 32K
L2 cache 1024K
L3 cache 8448K
NUMA node0 CPU(s) 0-11

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low performance of Supervised training #563

Low performance of Supervised training #563

sidgan commented Aug 23, 2022

Low performance of Supervised training #563

Low performance of Supervised training #563

Comments

sidgan commented Aug 23, 2022

Expected behavior:

Command:

Environment: