Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error running style.py: Not found: ./bin/ptxas not found #267

Open
moldach opened this issue Mar 23, 2021 · 0 comments
Open

Error running style.py: Not found: ./bin/ptxas not found #267

moldach opened this issue Mar 23, 2021 · 0 comments

Comments

@moldach
Copy link

moldach commented Mar 23, 2021

I'm getting an error trying to train checkpoints using style.py and the traceback seems to point to: Not found: ./bin/ptxas not found as the source of the error.

Do you have any idea what the issue here is?

Submission Script

#!/bin/bash
#$ -pwd

# bash fastTrainer.bash /images/ /outpath/ /testpath/
##
## An embarrassingly parallel script to train many style transfer networks on a HPC
## Access to SLURM job scheduler and fast-style-transfer is required to run this program.
## The three mandatory pathways must be specified in the indicated order.

IMG=$(readlink -f "${1%/}")     # path_to_train_images
OUT_DIR=$(readlink -f "${2%/}")  # path_to_checkpoints
TEST=$(readlink -f "${3%/}")  # path_to_tests

mkdir -p ${OUT_DIR}/jobs

JID=0   # job ID for SLURM job name

for f in ${IMG}/*; do

        let JID=(JID+1)

  cat > ${OUT_DIR}/jobs/style_${JID}.bash << EOT # write job information for each job
#!/bin/bash
#SBATCH --gres=gpu:1        # request GPU
#SBATCH --account=def-mtarailo
#SBATCH --cpus-per-task=10   # maximum CPU cores per GPU request
#SBATCH --time=12:00:00     # request 8 hours of walltime
#SBATCH --mem=10G           # request 10G (or 1G per core)
#SBATCH --job-name="fst_${JID}"
#SBATCH --output=${OUT_DIR}/jobs/%N-%j.out  # %N for node name, %j for jobID
#SBATCH --error=${OUT_DIR}/jobs/%N-%j.err  # %N for node name, %j for jobID

### JOB SCRIPT BELLOW ###

# Load Modules
source activate tf-gpu
module load cuda/10.1

mkdir ${OUT_DIR}/${JID}
#mkdir ${TEST}/${JID}

python style.py --style $f \
  --checkpoint-dir ${OUT_DIR}/${JID} \
  --test examples/content/chicago.jpg \
  --test-dir ${OUT_DIR}/${JID} \
  --content-weight 1.5e1 \
  --checkpoint-iterations 1000 \
  --batch-size 20

EOT
  chmod 754 $(readlink -f "${OUT_DIR}")/jobs/style_${JID}.bash
  sbatch $(readlink -f "${OUT_DIR}")/jobs/style_${JID}.bash
done

Error

Due to MODULEPATH changes, the following have been reloaded:
  1) openmpi/3.1.2

2021-03-22 21:11:45.184962: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-03-22 21:11:45.240458: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:1d:00.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-03-22 21:11:45.283971: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-03-22 21:11:45.473571: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-03-22 21:11:45.634452: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-03-22 21:11:45.716070: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-03-22 21:11:45.890345: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-03-22 21:11:45.927906: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-03-22 21:11:46.114498: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-03-22 21:11:46.116365: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2021-03-22 21:11:46.116859: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
2021-03-22 21:11:46.164009: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2400000000 Hz
2021-03-22 21:11:46.165725: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x564a7240ad60 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-03-22 21:11:46.165751: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-03-22 21:11:46.168706: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:1d:00.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-03-22 21:11:46.168744: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-03-22 21:11:46.168760: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-03-22 21:11:46.168775: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-03-22 21:11:46.168789: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-03-22 21:11:46.168803: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-03-22 21:11:46.168817: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-03-22 21:11:46.168831: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-03-22 21:11:46.170436: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2021-03-22 21:11:46.170479: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-03-22 21:11:46.305358: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-22 21:11:46.305405: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0 
2021-03-22 21:11:46.305424: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N 
2021-03-22 21:11:46.308686: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15059 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:1d:00.0, compute capability: 7.0)
2021-03-22 21:11:46.312111: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x564a72cf2f30 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-03-22 21:11:46.312137: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla V100-SXM2-16GB, Compute Capability 7.0
2021-03-22 21:11:49.670398: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:1d:00.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-03-22 21:11:49.670498: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-03-22 21:11:49.670522: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-03-22 21:11:49.670543: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-03-22 21:11:49.670561: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-03-22 21:11:49.670577: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-03-22 21:11:49.670594: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-03-22 21:11:49.670613: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-03-22 21:11:49.672271: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2021-03-22 21:11:49.672324: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-22 21:11:49.672338: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0 
2021-03-22 21:11:49.672349: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N 
2021-03-22 21:11:49.674005: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15059 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:1d:00.0, compute capability: 7.0)
WARNING:tensorflow:From /home/moldach/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1635: calling BaseResourceVariable.__init__ (from tensorflow.python.ops
: Tesla V100-SXM2-16GB, pci bus id: 0000:1d:00.0, compute capability: 7.0)
WARNING:tensorflow:From /home/moldach/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1635: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
2021-03-22 21:12:02.532101: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-03-22 21:12:04.268532: W tensorflow/stream_executor/gpu/redzone_allocator.cc:312] Not found: ./bin/ptxas not found
Relying on driver to perform ptx compilation. This message will be only logged once.
2021-03-22 21:12:04.438014: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
Traceback (most recent call last):
  File "/home/moldach/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/imageio/plugins/pillow.py", line 669, in pil_try_read
    im.getdata()[0]
  File "/home/moldach/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/PIL/Image.py", line 1271, in getdata
    self.load()
  File "/home/moldach/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/PIL/ImageFile.py", line 260, in load
    "image file is truncated "
OSError: image file is truncated (20 bytes not processed)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "style.py", line 167, in <module>
    main()
  File "style.py", line 147, in main
    for preds, losses, i, epoch in optimize(*args, **kwargs):
  File "src/optimize.py", line 105, in optimize
    X_batch[j] = get_img(img_p, (256,256,3)).astype(np.float32)
  File "src/utils.py", line 18, in get_img
    img = imageio.imread(src, pilmode='RGB') # misc.imresize(, (256, 256, 3))
  File "/home/moldach/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/imageio/core/functions.py", line 265, in imread
    reader = read(uri, format, "i", **kwargs)
  File "/home/moldach/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/imageio/core/functions.py", line 186, in get_reader
    return format.get_reader(request)
  File "/home/moldach/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/imageio/core/format.py", line 170, in get_reader
    return self.Reader(self, request)
  File "/home/moldach/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/imageio/core/format.py", line 221, in __init__
    self._open(**self.request.kwargs.copy())
  File "/home/moldach/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/imageio/plugins/pillow.py", line 429, in _open
    return PillowFormat.Reader._open(self, pilmode=pilmode, as_gray=as_gray)
  File "/home/moldach/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/imageio/plugins/pillow.py", line 135, in _open
    pil_try_read(self._im)
  File "/home/moldach/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/imageio/plugins/pillow.py", line 680, in pil_try_read
    raise ValueError(error_message)
ValueError: Could not load "" 
Reason: "image file is truncated (20 bytes not processed)"
Please see documentation at: http://pillow.readthedocs.io/en/latest/installation.html#external-libraries
(END)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant