Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running training in a loop (M1 chip) #1345

Open
cadyyuheng opened this issue Jul 25, 2022 · 3 comments
Open

Running training in a loop (M1 chip) #1345

cadyyuheng opened this issue Jul 25, 2022 · 3 comments

Comments

@cadyyuheng
Copy link

Hello,

I'm trying to repeat my training and prediction in a loop for 20 times. My code I have worked fine for Intel-based MacBook. However, I recently changed to an M1-based MacBook, and my loop repetition seems to get some trouble -- Although I didn't get any errors, the program never came to a finish for the 5th repeat in the loop of my training. If I change the loop number to 3, the loop can finish without any issue. I wonder if this is because some memory quota has been reached and if there's any way to raise the quota. Really appreciate the help.

> reticulate::py_config()
python:         /Users/yfy6677/Library/r-miniconda-arm64/envs/r-reticulate/bin/python
libpython:      /Users/yfy6677/Library/r-miniconda-arm64/envs/r-reticulate/lib/libpython3.8.dylib
pythonhome:     /Users/yfy6677/Library/r-miniconda-arm64/envs/r-reticulate:/Users/yfy6677/Library/r-miniconda-arm64/envs/r-reticulate
version:        3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:05:16)  [Clang 12.0.1 ]
numpy:          /Users/yfy6677/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/numpy
numpy_version:  1.22.4
> tensorflow::tf_config()
TensorFlow v2.9.2 ()
Python v3.8 (~/Library/r-miniconda-arm64/envs/r-reticulate/bin/python)
> reticulate::import("tensorflow")
Module(tensorflow)
> reticulate::py_last_error()
NULL
> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.5

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.0.9        sp_1.5-0           SeuratObject_4.1.0 Seurat_4.1.1      

loaded via a namespace (and not attached):
  [1] Rtsne_0.16            colorspace_2.0-3      ggsignif_0.6.3        deldir_1.0-6          ellipsis_0.3.2       
  [6] ggridges_0.5.3        rprojroot_2.0.3       base64enc_0.1-3       rstudioapi_0.13       spatstat.data_2.2-0  
 [11] farver_2.1.0          matchingR_1.3.3       ggpubr_0.4.0          leiden_0.4.2          listenv_0.8.0        
 [16] bit64_4.0.5           ggrepel_0.9.1         RSpectra_0.16-1       fansi_1.0.3           codetools_0.2-18     
 [21] splines_4.2.1         knitr_1.39            zeallot_0.1.0         polyclip_1.10-0       jsonlite_1.8.0       
 [26] broom_1.0.0           ica_1.0-3             cluster_2.1.3         tfruns_1.5.0          png_0.1-7            
 [31] rgeos_0.5-9           uwot_0.1.11           shiny_1.7.2           sctransform_0.3.3     spatstat.sparse_2.1-1
 [36] compiler_4.2.1        httr_1.4.3            backports_1.4.1       Matrix_1.4-1          fastmap_1.1.0        
 [41] lazyeval_0.2.2        cli_3.3.0             later_1.3.0           htmltools_0.5.2       tools_4.2.1          
 [46] igraph_1.3.2          gtable_0.3.0          glue_1.6.2            RANN_2.6.1            reshape2_1.4.4       
 [51] Rcpp_1.0.9            carData_3.0-5         scattermore_0.8       vctrs_0.4.1           nlme_3.1-157         
 [56] progressr_0.10.1      lmtest_0.9-40         spatstat.random_2.2-0 xfun_0.31             stringr_1.4.0        
 [61] globals_0.15.1        mime_0.12             miniUI_0.1.1.1        lifecycle_1.0.1       irlba_2.3.5          
 [66] rstatix_0.7.0         goftest_1.2-3         future_1.26.1         MASS_7.3-57           zoo_1.8-10           
 [71] scales_1.2.0          spatstat.core_2.4-4   promises_1.2.0.1      spatstat.utils_2.3-1  parallel_4.2.1       
 [76] RColorBrewer_1.1-3    yaml_2.3.5            reticulate_1.25-9000  pbapply_1.5-0         gridExtra_2.3        
 [81] ggplot2_3.3.6         keras_2.9.0.9000      rpart_4.1.16          stringi_1.7.6         tensorflow_2.9.0.9000
 [86] rlang_1.0.4           pkgconfig_2.0.3       matrixStats_0.62.0    pracma_2.3.8          evaluate_0.15        
 [91] lattice_0.20-45       ROCR_1.0-11           purrr_0.3.4           tensor_1.5            labeling_0.4.2       
 [96] patchwork_1.1.1       htmlwidgets_1.5.4     bit_4.0.4             cowplot_1.1.1         tidyselect_1.1.2     
[101] here_1.0.1            parallelly_1.32.0     RcppAnnoy_0.0.19      plyr_1.8.7            magrittr_2.0.3       
[106] R6_2.5.1              generics_0.1.3        whisker_0.4           mgcv_1.8-40           pillar_1.8.0         
[111] fitdistrplus_1.1-8    survival_3.3-1        abind_1.4-5           tibble_3.1.7          future.apply_1.9.0   
[116] hdf5r_1.3.5           car_3.1-0             KernSmooth_2.23-20    utf8_1.2.2            spatstat.geom_2.4-0  
[121] plotly_4.10.0         rmarkdown_2.14        grid_4.2.1            data.table_1.14.2     digest_0.6.29        
[126] xtable_1.8-4          tidyr_1.2.0           httpuv_1.6.5          munsell_0.5.0         viridisLite_0.4.0    
1] "1th repeat run"

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Metal device set to: Apple M1 Pro

systemMemory: 32.00 GB
maxCacheSize: 10.67 GB

2022-07-25 13:25:02.019748: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-07-25 13:25:02.019862: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-07-25 13:25:02.214002: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-07-25 13:25:02.714319: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-25 13:25:03.477795: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-25 13:26:11.688963: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
21/21 [==============================] - 0s 4ms/step
171/171 [==============================] - 0s 2ms/step
[1] "2th repeat run"
2022-07-25 13:26:16.122695: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-25 13:26:16.885002: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-25 13:27:24.244544: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
21/21 [==============================] - 0s 3ms/step
171/171 [==============================] - 0s 2ms/step
[1] "3th repeat run"
2022-07-25 13:27:26.854203: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-25 13:27:27.810199: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-25 13:28:35.355860: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
21/21 [==============================] - 0s 4ms/step
171/171 [==============================] - 0s 2ms/step
[1] "4th repeat run"
2022-07-25 13:28:37.895150: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-25 13:28:38.820674: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-25 13:29:45.818672: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
21/21 [==============================] - 0s 4ms/step
171/171 [==============================] - 0s 2ms/step
[1] "5th repeat run"
2022-07-25 13:29:48.424173: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-25 13:29:49.824081: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
@t-kalinowski
Copy link
Member

Hi, I haven't encountered this yet. The fact that your code worked fine on an Intel Mac suggests that it's likely an issue with the 'tensorflow-macos' and 'tensorflow-metal' packages provided by Apple.

If you can provide a reprex and I can reproduce on my side, I can take a look to see if it's an issue with the upstream package or with something related to R or reticulate.

@cadyyuheng
Copy link
Author

@t-kalinowski ,

Thank you so much for your reply. Just one more question as I want to test is that the CPU or the GPU of M1 cause the issue, is there any quick function to disable the use of M1 GPU for training? Would something like

Sys.setenv("CUDA_VISIBLE_DEVICES" = -1)  

work for M1?

@t-kalinowski
Copy link
Member

As far as I know, visibility of the M1 GPU cannot be controlled through an environment variable. The way to hide it is directly in the TensorFlow session:

tf$config$get_visible_devices("CPU") |> 
  tf$config$set_visible_devices()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants