Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RStudio session aborted while trying to train CNN model #1372

Open
liu-zhiyang opened this issue Aug 15, 2023 · 2 comments
Open

RStudio session aborted while trying to train CNN model #1372

liu-zhiyang opened this issue Aug 15, 2023 · 2 comments

Comments

@liu-zhiyang
Copy link

liu-zhiyang commented Aug 15, 2023

Hi,
I am trying to train CNN model using keras in R. I followed the example from Simple CNN on CIFAR10 dataset.
Everything seems to run well but the trainning step. After run the model trainning code, RStudio crashes.

> model %>% fit(
+   x_train, y_train,
+   batch_size = batch_size,
+   epochs = epochs,
+   validation_data = list(x_test, y_test),
+   shuffle = TRUE
+ )
Epoch 1/50
2023-08-15 15:59:47.289580: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8903

I noticed that my GPU memory usage going to 100% before RStudio session was terminated.
Do you have any idea about this issue? and how can I solve this problem?
Thank you very much!

Here is my seesion info:

> tensorflow::tf_config()
TensorFlow v2.10.1 (C:\PROGRA~3\MINICO~1\lib\site-packages\tensorflow\__init__.p)
Python v3.9 (C:/ProgramData/miniconda3/python.exe)
> tensorflow::tf_gpu_configured()
2023-08-15 15:53:14.462317: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-08-15 15:53:15.386498: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /device:GPU:0 with 21348 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:73:00.0, compute capability: 8.9
TensorFlow built with CUDA:  TRUE 
2023-08-15 15:53:15.393138: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /device:GPU:0 with 21348 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:73:00.0, compute capability: 8.9
GPU device name:  /device:GPU:0[1] TRUE
> sessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22621)

Matrix products: default


locale:
[1] LC_COLLATE=Chinese (Simplified)_China.utf8  LC_CTYPE=Chinese (Simplified)_China.utf8   
[3] LC_MONETARY=Chinese (Simplified)_China.utf8 LC_NUMERIC=C                               
[5] LC_TIME=Chinese (Simplified)_China.utf8    

time zone: Asia/Shanghai
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] keras_2.11.1

loaded via a namespace (and not attached):
 [1] R6_2.5.1          base64enc_0.1-3   Matrix_1.5-4.1    lattice_0.21-8    reticulate_1.31  
 [6] magrittr_2.0.3    png_0.1-8         generics_0.1.3    cli_3.6.1         tensorflow_2.11.0
[11] grid_4.3.1        withr_2.5.0       zeallot_0.1.0     tfruns_1.5.1      compiler_4.3.1   
[16] rstudioapi_0.15.0 tools_4.3.1       whisker_0.4.1     Rcpp_1.0.11       rlang_1.1.1      
[21] jsonlite_1.8.7    stringi_1.7.12  
@t-kalinowski
Copy link
Member

  • Can you reproduce the crash outside of RStudio (i.e., running R in cmd.exe)? Running outside the IDE occasionally gives a fuller error message giving information. With what's provided, this error could be from a variety of reasons, though my best guess is it's due to a driver or dll version mismatch.

Native GPU support on Windows is no longer supported with more recent versions of Tensorflow. 2.10 was the last release to support it (with TF 2.14 around the corner now).
I would suggest migrating to Linux soon if possible. If you remain on Windows, I would encourage migrating the workflow to WSL if possible, where native GPU support continues to be officially supported.

This article may be helpful for using RStudio with WSL: https://support.posit.co/hc/en-us/articles/360049776974-Using-RStudio-Server-in-Windows-WSL2

@liu-zhiyang
Copy link
Author

@t-kalinowski
Thanks for your advice. While runing same code in R gui, it crashes too. I have tried to lower the version of Tensorflow to 2.6.0 and re-install cudatookit=11.2.2 and cudnn=8.1.0.77 in the conda environment(original cudatoolkit=11.8 and cudnn=8.9.3.28 were installed as .exe and .zip file manually). These changes really worked. But there were some small problem(i.e While training CNN model and even after training, the memory usage of GPU was always 100%.).
I will also try RStudio under WSL.
Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants