[Share]How to run demo and training in a docker container #31

liupgd · 2021-06-28T10:11:48Z

After miserable experience on running training on this repo. I think I need to write it down to share to others. My situation is: I failed run the model on my own ubuntu server, as I didn't install CUDA10.0. But, running in the docker is also not so easy.

Run the demo

Build the docker

docker build -t new-comod-gan .
docker run -itd -v /your_work_dir:/work  -v /your_data_dir:/data --name comod -p 7200-7220:7200-7220 --gpus all new-comod-gan /bin/bash

Install extra packages in your container

pip install opencv-python 
pip install tqdm
pip install scikit-learn

Prepare your test image dataset

python dataset_tools/create_from_images.py --val-image-dir ./your_test_images_dir --tfrecord-dir ./tfrecords

Add cuda to path in your container, you'd better add this to .bashrc

export PATH="/usr/local/nvidia/bin:/usr/local/cuda/bin:$PATH"

Now you can run the demo with pretrained model

python run_demo.py -c ./pretrained/co-mod-gan-places2-050000.pkl -d ./tfrecords

But...you didn't get the GUI from your remote container? Maybe VcXsrv can help you if you are using windows locally. My solution is:
- Install VcXsrv in windows
- Use vscode to access my container, and install Remote X11 plugin in vscode.
- In vscode settings->Remote, search Host Option, and set your remote Host'IP with your remote server's IP(Not the container's IP)
- Start VcXsrv first in your local Windows, and then run your remote demo. GUI will be displayed if there is no error.

Run training

Preparing your own training dataset
Here, I prepared some images in ./imgs/png_samples/ for training test.

python dataset_tools/create_from_images.py --train-image-dir ./imgs/png_samples/ --val-image-dir ./imgs/png_samples/ --tfrecord-dir ./train_dataset --resolution 512 --num-channels 3

Note:
1. Only 3 channels can be used. If you're using png files, do not set --num-channels to 4, you'll get error in training.
2. --val-image-dir should be specified, or you'll have error in training.

Run your training

python run_training.py --data-dir=./  --dataset=train_dataset --metrics=ids10k --mirror-augment True --num-gpus=4

For researchers in China, you may need a VPN. The training process will download an inception model file. You can:

export https_proxy="https://your_vpn_ip&port"

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Share]How to run demo and training in a docker container #31

[Share]How to run demo and training in a docker container #31

liupgd commented Jun 28, 2021

[Share]How to run demo and training in a docker container #31

[Share]How to run demo and training in a docker container #31

Comments

liupgd commented Jun 28, 2021

Run the demo

Run training