Running training in a loop (M1 chip) #1345

cadyyuheng · 2022-07-25T18:48:31Z

Hello,

I'm trying to repeat my training and prediction in a loop for 20 times. My code I have worked fine for Intel-based MacBook. However, I recently changed to an M1-based MacBook, and my loop repetition seems to get some trouble -- Although I didn't get any errors, the program never came to a finish for the 5th repeat in the loop of my training. If I change the loop number to 3, the loop can finish without any issue. I wonder if this is because some memory quota has been reached and if there's any way to raise the quota. Really appreciate the help.

> reticulate::py_config()
python:         /Users/yfy6677/Library/r-miniconda-arm64/envs/r-reticulate/bin/python
libpython:      /Users/yfy6677/Library/r-miniconda-arm64/envs/r-reticulate/lib/libpython3.8.dylib
pythonhome:     /Users/yfy6677/Library/r-miniconda-arm64/envs/r-reticulate:/Users/yfy6677/Library/r-miniconda-arm64/envs/r-reticulate
version:        3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:05:16)  [Clang 12.0.1 ]
numpy:          /Users/yfy6677/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/numpy
numpy_version:  1.22.4
> tensorflow::tf_config()
TensorFlow v2.9.2 ()
Python v3.8 (~/Library/r-miniconda-arm64/envs/r-reticulate/bin/python)
> reticulate::import("tensorflow")
Module(tensorflow)
> reticulate::py_last_error()
NULL
> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.5

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.0.9        sp_1.5-0           SeuratObject_4.1.0 Seurat_4.1.1      

loaded via a namespace (and not attached):
  [1] Rtsne_0.16            colorspace_2.0-3      ggsignif_0.6.3        deldir_1.0-6          ellipsis_0.3.2       
  [6] ggridges_0.5.3        rprojroot_2.0.3       base64enc_0.1-3       rstudioapi_0.13       spatstat.data_2.2-0  
 [11] farver_2.1.0          matchingR_1.3.3       ggpubr_0.4.0          leiden_0.4.2          listenv_0.8.0        
 [16] bit64_4.0.5           ggrepel_0.9.1         RSpectra_0.16-1       fansi_1.0.3           codetools_0.2-18     
 [21] splines_4.2.1         knitr_1.39            zeallot_0.1.0         polyclip_1.10-0       jsonlite_1.8.0       
 [26] broom_1.0.0           ica_1.0-3             cluster_2.1.3         tfruns_1.5.0          png_0.1-7            
 [31] rgeos_0.5-9           uwot_0.1.11           shiny_1.7.2           sctransform_0.3.3     spatstat.sparse_2.1-1
 [36] compiler_4.2.1        httr_1.4.3            backports_1.4.1       Matrix_1.4-1          fastmap_1.1.0        
 [41] lazyeval_0.2.2        cli_3.3.0             later_1.3.0           htmltools_0.5.2       tools_4.2.1          
 [46] igraph_1.3.2          gtable_0.3.0          glue_1.6.2            RANN_2.6.1            reshape2_1.4.4       
 [51] Rcpp_1.0.9            carData_3.0-5         scattermore_0.8       vctrs_0.4.1           nlme_3.1-157         
 [56] progressr_0.10.1      lmtest_0.9-40         spatstat.random_2.2-0 xfun_0.31             stringr_1.4.0        
 [61] globals_0.15.1        mime_0.12             miniUI_0.1.1.1        lifecycle_1.0.1       irlba_2.3.5          
 [66] rstatix_0.7.0         goftest_1.2-3         future_1.26.1         MASS_7.3-57           zoo_1.8-10           
 [71] scales_1.2.0          spatstat.core_2.4-4   promises_1.2.0.1      spatstat.utils_2.3-1  parallel_4.2.1       
 [76] RColorBrewer_1.1-3    yaml_2.3.5            reticulate_1.25-9000  pbapply_1.5-0         gridExtra_2.3        
 [81] ggplot2_3.3.6         keras_2.9.0.9000      rpart_4.1.16          stringi_1.7.6         tensorflow_2.9.0.9000
 [86] rlang_1.0.4           pkgconfig_2.0.3       matrixStats_0.62.0    pracma_2.3.8          evaluate_0.15        
 [91] lattice_0.20-45       ROCR_1.0-11           purrr_0.3.4           tensor_1.5            labeling_0.4.2       
 [96] patchwork_1.1.1       htmlwidgets_1.5.4     bit_4.0.4             cowplot_1.1.1         tidyselect_1.1.2     
[101] here_1.0.1            parallelly_1.32.0     RcppAnnoy_0.0.19      plyr_1.8.7            magrittr_2.0.3       
[106] R6_2.5.1              generics_0.1.3        whisker_0.4           mgcv_1.8-40           pillar_1.8.0         
[111] fitdistrplus_1.1-8    survival_3.3-1        abind_1.4-5           tibble_3.1.7          future.apply_1.9.0   
[116] hdf5r_1.3.5           car_3.1-0             KernSmooth_2.23-20    utf8_1.2.2            spatstat.geom_2.4-0  
[121] plotly_4.10.0         rmarkdown_2.14        grid_4.2.1            data.table_1.14.2     digest_0.6.29        
[126] xtable_1.8-4          tidyr_1.2.0           httpuv_1.6.5          munsell_0.5.0         viridisLite_0.4.0

1] "1th repeat run"

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Metal device set to: Apple M1 Pro

systemMemory: 32.00 GB
maxCacheSize: 10.67 GB

2022-07-25 13:25:02.019748: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-07-25 13:25:02.019862: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-07-25 13:25:02.214002: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-07-25 13:25:02.714319: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-25 13:25:03.477795: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-25 13:26:11.688963: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
21/21 [==============================] - 0s 4ms/step
171/171 [==============================] - 0s 2ms/step
[1] "2th repeat run"
2022-07-25 13:26:16.122695: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-25 13:26:16.885002: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-25 13:27:24.244544: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
21/21 [==============================] - 0s 3ms/step
171/171 [==============================] - 0s 2ms/step
[1] "3th repeat run"
2022-07-25 13:27:26.854203: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-25 13:27:27.810199: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-25 13:28:35.355860: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
21/21 [==============================] - 0s 4ms/step
171/171 [==============================] - 0s 2ms/step
[1] "4th repeat run"
2022-07-25 13:28:37.895150: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-25 13:28:38.820674: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-25 13:29:45.818672: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
21/21 [==============================] - 0s 4ms/step
171/171 [==============================] - 0s 2ms/step
[1] "5th repeat run"
2022-07-25 13:29:48.424173: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-25 13:29:49.824081: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.

The text was updated successfully, but these errors were encountered:

t-kalinowski · 2022-07-25T19:13:56Z

Hi, I haven't encountered this yet. The fact that your code worked fine on an Intel Mac suggests that it's likely an issue with the 'tensorflow-macos' and 'tensorflow-metal' packages provided by Apple.

If you can provide a reprex and I can reproduce on my side, I can take a look to see if it's an issue with the upstream package or with something related to R or reticulate.

cadyyuheng · 2022-07-25T19:48:59Z

@t-kalinowski ,

Thank you so much for your reply. Just one more question as I want to test is that the CPU or the GPU of M1 cause the issue, is there any quick function to disable the use of M1 GPU for training? Would something like

Sys.setenv("CUDA_VISIBLE_DEVICES" = -1)

work for M1?

t-kalinowski · 2022-07-26T13:50:59Z

As far as I know, visibility of the M1 GPU cannot be controlled through an environment variable. The way to hide it is directly in the TensorFlow session:

tf$config$get_visible_devices("CPU") |> 
  tf$config$set_visible_devices()

cadyyuheng closed this as completed Jul 25, 2022

cadyyuheng reopened this Jul 25, 2022

t-kalinowski mentioned this issue Aug 5, 2022

Confirming M1 optimised installation of tensorflow rstudio/tensorflow#541

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running training in a loop (M1 chip) #1345

Running training in a loop (M1 chip) #1345

cadyyuheng commented Jul 25, 2022

t-kalinowski commented Jul 25, 2022

cadyyuheng commented Jul 25, 2022

t-kalinowski commented Jul 26, 2022

Running training in a loop (M1 chip) #1345

Running training in a loop (M1 chip) #1345

Comments

cadyyuheng commented Jul 25, 2022

t-kalinowski commented Jul 25, 2022

cadyyuheng commented Jul 25, 2022

t-kalinowski commented Jul 26, 2022