GPU & TPU & IPU unavailable and failure to re-train the model on Windows #4

PPierre22 · 2021-10-12T13:34:47Z

Hi,

I'm having some difficulties to reproduce the tutorial you're giving (on a Window 10 system). In particular, i cannot train the model (R throws out an error).

Overall, everything seems to work smoothly up until the "Train the model" subsection of your tutorial. There is, however, one error when i load the model:

> model = df_model()
ERROR 1: PROJ: proj_create_from_database: Cannot find proj.db
PROJ: proj_create_from_database: Cannot find proj.db
Reading config file: C:\Users...\AppData\Local\r-miniconda\envs\r-reticulate\lib\site-packages\deepforest\data\deepforest_config.yml

Despite this error message, i can load & predict & visualize on the example *.png and *.tif data you provide (and everything seems to work just fine). However, when i train the model (using the data provided in your tutorial), i have some sort of warning here :

> model$create_trainer()
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs

Then, the following command throws out an error message (copied hereafter), and nothing else happen, the progression bar stays at 0% :

model$trainer$fit(model)

| Name | Type | Params

0 | model | RetinaNet | 32.1 M

31.9 M Trainable params
222 K Non-trainable params
32.1 M Total params
128.592 Total estimated model params size (MB)
C:\Users...\AppData\Local\r-miniconda\envs\r-reticulate\lib\site-packages\pytorch_lightning\trainer\data_loading.py:382: UserWarning: One of given dataloaders is None and it will be skipped.
rank_zero_warn("One of given dataloaders is None and it will be skipped.")
ay be a bottleneck. Consider increasing the value of the num_workers argument (try 8 which is the number of cpus on this machine) in the DataLoader init to improve performance.oader, does not have many workers which m
f"The dataloader, {name}, does not have many workers which may be a bottleneck."
Epoch 0: 0%| | 0/1 [00:00<?, ?it/s]

I'm under the impression that the program do not have access to my CPUs. I've been looking for this "num_workers" argument that the error message suggest to increase but without success, i only found a "worker" argument in model$config and changing it to as.integer(8) does not change a thing.

Any clue on what might be going on?

Thanks

The text was updated successfully, but these errors were encountered:

henrykironde · 2021-10-19T19:13:58Z

Thanks for reporting this @PPierre22 , we are still trying to find out what could be causing this error.

ethanwhite · 2021-10-22T21:16:55Z

Thanks again for reporting @PPierre22. We can replicate the behavior you are seeing and we're not sure exactly what's going on at the moment. The CPU should be getting used if no GPUs are present and changing the number of workers the way you did is the correct approach.

Everything works fine in Python on the same Windows machines I've tested on and deepforestr runs fine on Linux, but we are seeing a possibly related issue on macOS (#5).

If you need to get something working the best short-term solution is probably to run things directly in Python (see https://deepforest.readthedocs.io/en/latest/) and then export anything you need to use in R.

We'll keep looking to see if we can fix things in the R package, but this is likely an issue with reticulate, pytorch, and Windows, and so is probably upstream of us at the moment.

PPierre22 · 2021-10-22T22:00:04Z

Hello @ethanwhite and @henrykironde,
Thank you for your assistance. Please, let me know if you end up fixing this issue in the R package.
Best

ethanwhite · 2023-12-28T13:56:53Z

I currently have things running on Windows again for training with no changes to the code base (but some small ones to the instructions), suggesting that the upstream issues have been addressed.

@PPierre22 - if you still have time/interest would you mind doing a fresh install (to get the underlying DeepForest package updated) and then check to see if the following code runs?

library(deepforestr)

model = df_model()
model$use_release()

annotations_file = get_data("testfile_deepforest.csv")

model$config$train$csv_file = annotations_file
model$config$train$root_dir = get_data(".")

model$create_trainer()
model$train$fit(model)

If that works then I can walk you through how to do a proper retraining since for some reason I'm still investigating it looks like the need to use the config file instead of setting some config options from R.

ethanwhite · 2024-05-11T21:51:05Z

We now have Windows testing including training on GH Actions (see #10). Everything is working so closing this, but if anyone runs into this issue on their local system feel free to reopen.

ethanwhite mentioned this issue Mar 5, 2023

Crash while training on Windows #9

Closed

ethanwhite closed this as completed May 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU & TPU & IPU unavailable and failure to re-train the model on Windows #4

GPU & TPU & IPU unavailable and failure to re-train the model on Windows #4

PPierre22 commented Oct 12, 2021

henrykironde commented Oct 19, 2021

ethanwhite commented Oct 22, 2021

PPierre22 commented Oct 22, 2021

ethanwhite commented Dec 28, 2023

ethanwhite commented May 11, 2024

GPU & TPU & IPU unavailable and failure to re-train the model on Windows #4

GPU & TPU & IPU unavailable and failure to re-train the model on Windows #4

Comments

PPierre22 commented Oct 12, 2021

henrykironde commented Oct 19, 2021

ethanwhite commented Oct 22, 2021

PPierre22 commented Oct 22, 2021

ethanwhite commented Dec 28, 2023

ethanwhite commented May 11, 2024