Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RunHarmony on the weights branch errors with incompatible matrix dimensions #219

Open
deevdevil88 opened this issue Nov 9, 2023 · 16 comments

Comments

@deevdevil88
Copy link

deevdevil88 commented Nov 9, 2023

Hi,
I am trying to use the weighted harmony as implemented in this paper DOI: 10.1016/j.medj.2022.05.002 , so i was using the weights branch.
I installed the weights branch successfully and can run the `RunHarmony' function on a seurat object. However, after the first iteration i get the following error

the command im running, after insalling the weights branch is
seu_object <- harmony::RunHarmony(object = seu_object,group.by.vars = c("cart_id","tissue_simple"), reduction = "weighted_pca",dims.use = 1:50,tau = 0,max.iter.harmony = 10, weight.by = "tissue_simple", reduction.save = "weighed_pca_harmony")

Screenshot 2023-11-08 at 19 25 38

on traceback of the error it seems to come from the Harmony function within.
Screenshot 2023-11-08 at 19 31 01

the error persists with multiple parameter changes, includng using normal or wieghted PCA, so i dont think i am using the wrong covariates. Any help on this issue would be much appreciated. I tried to use the code that was on the github repo for the paper here : https://github.com/immunogenomics/FibroblastAtlas2022/blob/Med/Analyses_human/IntegratedClusters.ipynb , but i think the weighted harmony code on here isnt the easiset to implement.

best,
Devika

@deevdevil88 deevdevil88 changed the title RunHarmony on the weights branch errors RunHarmony on the weights branch errors with incompatible matrix dimensions Nov 9, 2023
@PietroAndrei
Copy link

Hello,

I am also having the same issue with Harmony when I try to integrate different Seurat objects through the Seurat v5 IntegrateLayers() function:

seurat[["RNA"]] = split(seurat[["RNA"]],f=seurat$slide)
seurat = seurat %>% NormalizeData() %>% 
FindVariableFeatures(nfeatures=1000) %>% 
ScaleData() %>% 
RunPCA(npcs=30)

seurat = IntegrateLayers(seurat,method=HarmonyIntegration,orig.reduction='pca',new.reduction='harmony')

Harmony 1/10
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Error: element-wise pow(): incompatible matrix dimensions: 100x2 and 2x1



sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

...
...
...
...

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] harmony_1.0        Rcpp_1.0.11        dplyr_1.1.3        Seurat_5.0.0      
[5] SeuratObject_5.0.0 sp_2.0-0          

@pati-ni
Copy link
Collaborator

pati-ni commented Nov 24, 2023

Hi @PietroAndrei,

Two things.

  • Can you use the harmony API RunHarmony() and not the Seurat Wrapper.?
  • Please install our latest version from github v1.2

@PietroAndrei
Copy link

Hi @pati-ni,

I solved the issue by upgrading Harmony from v1.0 to v1.2, thanks!

It may have been a compatibility problem with Rcpp, as I tried to use another dataset, and Harmony v.1.0 worked only when run within an environment with Rcpp < 1.0.10.

Thanks again for your help,

Pietro

@pati-ni pati-ni closed this as completed Nov 25, 2023
@jjs847
Copy link

jjs847 commented Feb 6, 2024

Hi @pati-ni, could you re-open this issue as I am also having the same error on the weights branch when trying to run HarmonyMatrix with weights.

pastedImage

I have tried suggestions from other posts with a similar error (but not pertaining to the weights branch) such as downgrading Rcpp, converting the vars_use to factors or characters, and making sure there are no empty factor levels/NAs and am still getting the same error.
Updating to the harmony version as you suggested doesn't work since I need the HarmonyMatrix function with the weights argument from the weights branch.

@pati-ni pati-ni reopened this Feb 6, 2024
@pati-ni
Copy link
Collaborator

pati-ni commented Feb 6, 2024

Hi @jjs847,

does your code run when you install the normal branch? I did not know about the weights branch

@jjs847
Copy link

jjs847 commented Feb 6, 2024

Hi @pati-ni,

Yes, it runs to completion with harmony 1.0

@pati-ni
Copy link
Collaborator

pati-ni commented Feb 7, 2024

Hi @jjs847,

I have looked into the issue and I think I have addressed it. However, I can not reproduce the error with the existing weights branch due to the ambiguity of the pow call which won't let me compile.

Can you please install it from my branch to see if the issue is fixed?

devtools::install_github("pati-ni/harmony", ref="weights")

@deevdevil88
Copy link
Author

hi @pati-ni does this mean, the weights branch now works on your harmony repo and i can test it out again as well?

@pati-ni
Copy link
Collaborator

pati-ni commented Feb 9, 2024 via email

@jjs847
Copy link

jjs847 commented Feb 9, 2024

Hi @pati-ni,

sorry for the delayed response -- it seems like its working now, and I am not getting the incompatible matrix error, however, its running extremely slow. So far it's been running for 5 hours and it's only on the second harmony iteration. Previously, when I ran HarmonyMatrix from harmony 1.0 (just to test if I got a matrix error), it took less than 2 minutes to complete. Is the reason its taking so much longer now because of the weights? My data is not that large, around 25k cells.

Thanks for working on this.

@pati-ni
Copy link
Collaborator

pati-ni commented Feb 9, 2024

That is unreasonable slow. How many batches do you have?

So it is confusing the version was bumped to v1 accidentally on that branch. When you say that you run the branch in 2 mins, which branch do you refer to? Our current version is v1.2 and is significantly different than the weights branch (performance-wise).

Can you please provide some sessionInfo() outputs of the different harmony environments you are running the code on and some extra information for the dataset?

@pati-ni
Copy link
Collaborator

pati-ni commented Feb 10, 2024

hi @pati-ni does this mean, the weights branch now works on your harmony repo and i can test it out again as well?

@deevdevil88 yes, please go ahead! Although, keep in mind that @jjs847 is facing slow runtimes, which is puzzling since the code is very similar. Also, it would be good if we had your feedback whether this runs at your system

@jjs847
Copy link

jjs847 commented Feb 11, 2024

Hi @pati-ni, here is the session info for the two versions:

Harmony 1.2.0, took less than 1 minute to run
Screen Shot 2024-02-11 at 2 38 00 PM

sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux 8.6 (Ootpa)

Matrix products: default
BLAS: /gpfs/share/apps/R/4.2.2/lib64/R/lib/libRblas.so
LAPACK: /gpfs/share/apps/R/4.2.2/lib64/R/lib/libRlapack.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base

other attached packages:
[1] harmony_1.2.0 Rcpp_1.0.12 ggplot2_3.4.4
[4] Seurat_5.0.1 SeuratObject_5.0.1 sp_2.1-3
[7] singlecellmethods_0.1.0 data.table_1.15.0 dplyr_1.1.4

loaded via a namespace (and not attached):
[1] Rtsne_0.17 colorspace_2.1-0 deldir_2.0-2
[4] ellipsis_0.3.2 ggridges_0.5.6 RcppHNSW_0.5.0
[7] rstudioapi_0.15.0 spatstat.data_3.0-4 leiden_0.4.3.1
[10] listenv_0.9.1 ggrepel_0.9.5 RSpectra_0.16-1
[13] fansi_1.0.6 codetools_0.2-19 splines_4.2.2
[16] knitr_1.45 polyclip_1.10-6 spam_2.10-0
[19] jsonlite_1.8.8 ica_1.0-3 cluster_2.1.6
[22] png_0.1-8 uwot_0.1.16 shiny_1.8.0
[25] sctransform_0.4.1 spatstat.sparse_3.0-3 compiler_4.2.2
[28] httr_1.4.7 Matrix_1.6-3 fastmap_1.1.1
[31] lazyeval_0.2.2 cli_3.6.2 later_1.3.2
[34] htmltools_0.5.7 tools_4.2.2 igraph_1.6.0
[37] dotCall64_1.1-1 gtable_0.3.4 glue_1.7.0
[40] RANN_2.6.1 reshape2_1.4.4 scattermore_1.2
[43] vctrs_0.6.5 spatstat.explore_3.2-5 nlme_3.1-164
[46] progressr_0.14.0 lmtest_0.9-40 spatstat.random_3.2-2
[49] xfun_0.41 stringr_1.5.1 globals_0.16.2
[52] mime_0.12 miniUI_0.1.1.1 lifecycle_1.0.4
[55] irlba_2.3.5.1 goftest_1.2-3 future_1.33.1
[58] MASS_7.3-60.0.1 zoo_1.8-12 scales_1.3.0
[61] promises_1.2.1 spatstat.utils_3.0-4 RColorBrewer_1.1-3
[64] yaml_2.3.8 reticulate_1.34.0 pbapply_1.7-2
[67] gridExtra_2.3 stringi_1.8.3 fastDummies_1.7.3
[70] rlang_1.1.3 pkgconfig_2.0.3 matrixStats_1.1.0
[73] evaluate_0.23 lattice_0.22-5 ROCR_1.0-11
[76] purrr_1.0.2 tensor_1.5 patchwork_1.2.0
[79] htmlwidgets_1.6.4 cowplot_1.1.3 tidyselect_1.2.0
[82] parallelly_1.36.0 RcppAnnoy_0.0.22 plyr_1.8.9
[85] magrittr_2.0.3 R6_2.5.1 generics_0.1.3
[88] withr_3.0.0 pillar_1.9.0 fitdistrplus_1.1-11
[91] survival_3.5-7 abind_1.4-5 tibble_3.2.1
[94] future.apply_1.11.1 crayon_1.5.2 KernSmooth_2.23-22
[97] utf8_1.2.4 spatstat.geom_3.2-8 plotly_4.10.4
[100] rmarkdown_2.25 grid_4.2.2 digest_0.6.34
[103] xtable_1.8-4 tidyr_1.3.1 httpuv_1.6.14
[106] munsell_0.5.0 viridisLite_0.4.2

Harmony pat-ni/weights, canceled after running for 12 hours, got to 4th iteration but did not finish
(screen shot not from that run but just to show arguments used in function call)
Screen Shot 2024-02-11 at 2 40 48 PM

sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux 8.6 (Ootpa)

Matrix products: default
BLAS: /gpfs/share/apps/R/4.2.2/lib64/R/lib/libRblas.so
LAPACK: /gpfs/share/apps/R/4.2.2/lib64/R/lib/libRlapack.so

locale:
[1] LC_CTYPE=en_US.UTF-8
[2] LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8
[8] LC_NAME=C
[9] LC_ADDRESS=C
[10] LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8
[12] LC_IDENTIFICATION=C

attached base packages:
[1] parallel stats graphics grDevices
[5] utils datasets methods base

other attached packages:
[1] harmony_1.0
[2] Rcpp_1.0.12
[3] ggplot2_3.4.4
[4] Seurat_5.0.1
[5] SeuratObject_5.0.1
[6] sp_2.1-3
[7] singlecellmethods_0.1.0
[8] data.table_1.15.0
[9] dplyr_1.1.4

loaded via a namespace (and not attached):
[1] Rtsne_0.17
[2] colorspace_2.1-0
[3] deldir_2.0-2
[4] ellipsis_0.3.2
[5] ggridges_0.5.6
[6] RcppHNSW_0.5.0
[7] fs_1.6.3
[8] spatstat.data_3.0-4
[9] rstudioapi_0.15.0
[10] leiden_0.4.3.1
[11] listenv_0.9.1
[12] remotes_2.4.2.1
[13] ggrepel_0.9.5
[14] RSpectra_0.16-1
[15] fansi_1.0.6
[16] codetools_0.2-19
[17] splines_4.2.2
[18] cachem_1.0.8
[19] knitr_1.45
[20] polyclip_1.10-6
[21] pkgload_1.3.4
[22] spam_2.10-0
[23] jsonlite_1.8.8
[24] ica_1.0-3
[25] cluster_2.1.6
[26] png_0.1-8
[27] uwot_0.1.16
[28] spatstat.sparse_3.0-3
[29] shiny_1.8.0
[30] sctransform_0.4.1
[31] compiler_4.2.2
[32] httr_1.4.7
[33] Matrix_1.6-3
[34] fastmap_1.1.1
[35] lazyeval_0.2.2
[36] cli_3.6.2
[37] later_1.3.2
[38] htmltools_0.5.7
[39] tools_4.2.2
[40] igraph_1.6.0
[41] dotCall64_1.1-1
[42] gtable_0.3.4
[43] glue_1.7.0
[44] reshape2_1.4.4
[45] RANN_2.6.1
[46] scattermore_1.2
[47] vctrs_0.6.5
[48] nlme_3.1-164
[49] spatstat.explore_3.2-5
[50] progressr_0.14.0
[51] lmtest_0.9-40
[52] spatstat.random_3.2-2
[53] xfun_0.41
[54] stringr_1.5.1
[55] globals_0.16.2
[56] mime_0.12
[57] miniUI_0.1.1.1
[58] lifecycle_1.0.4
[59] irlba_2.3.5.1
[60] devtools_2.4.5
[61] goftest_1.2-3
[62] future_1.33.1
[63] MASS_7.3-60.0.1
[64] zoo_1.8-12
[65] scales_1.3.0
[66] spatstat.utils_3.0-4
[67] promises_1.2.1
[68] RColorBrewer_1.1-3
[69] curl_5.2.0
[70] yaml_2.3.8
[71] gridExtra_2.3
[72] memoise_2.0.1
[73] reticulate_1.34.0
[74] pbapply_1.7-2
[75] stringi_1.8.3
[76] fastDummies_1.7.3
[77] pkgbuild_1.4.3
[78] rlang_1.1.3
[79] pkgconfig_2.0.3
[80] matrixStats_1.1.0
[81] evaluate_0.23
[82] lattice_0.22-5
[83] tensor_1.5
[84] ROCR_1.0-11
[85] purrr_1.0.2
[86] patchwork_1.2.0
[87] htmlwidgets_1.6.4
[88] cowplot_1.1.3
[89] tidyselect_1.2.0
[90] parallelly_1.36.0
[91] RcppAnnoy_0.0.22
[92] plyr_1.8.9
[93] magrittr_2.0.3
[94] R6_2.5.1
[95] generics_0.1.3
[96] profvis_0.3.8
[97] withr_3.0.0
[98] pillar_1.9.0
[99] fitdistrplus_1.1-11
[100] abind_1.4-5
[101] survival_3.5-7
[102] tibble_3.2.1
[103] future.apply_1.11.1
[104] crayon_1.5.2
[105] KernSmooth_2.23-22
[106] utf8_1.2.4
[107] spatstat.geom_3.2-8
[108] plotly_4.10.4
[109] rmarkdown_2.25
[110] urlchecker_1.0.1
[111] usethis_2.2.2
[112] grid_4.2.2
[113] digest_0.6.34
[114] xtable_1.8-4
[115] tidyr_1.3.1
[116] httpuv_1.6.14
[117] munsell_0.5.0
[118] viridisLite_0.4.2
[119] sessioninfo_1.2.2

Thanks for your help with this!

@jjs847
Copy link

jjs847 commented Feb 16, 2024

@pati-ni just wondering if there are any updates?

@pati-ni
Copy link
Collaborator

pati-ni commented Feb 16, 2024

Hi @jjs847,

yes, sorry for the delay. I am trying to put together a dataset to test with the code.

Can you suggest some values for the weights? Do you set them within [0,1] range and the same value for the branch?

Also, I can see that you use the netlib blas/lapack. While I don't think this explains the slowness, it would be worth trying the OpenBlas version. If you install R from conda, then it comes with openblas. This makes a substantial difference performance-wise with this old version of harmony. However, before spawning R I would set export OMP_NUM_THREADS=1 to prevent OPENBLAS hogging all your CPU cores

Please let me know the necessary information so I can provide further guidance.

@jjs847
Copy link

jjs847 commented Feb 19, 2024

Hi @pati-ni,

no worries, thanks for your continued work on this!

I have a dataset with about 25k single-cell + single-nuclear data. My weights argument is a vector with an entry for each barcode of 0.53 for the single-cell data, and 8.56 for the single-nuclear data. Given your suggestion, I tried to re-run with weights between 0 and 1, and still only getting to the 4th iteration after 12 hours.

I will see if your second suggestion is possible on my HPC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants