Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MACS2 peaks beyond chromosome #1050

Open
t-and opened this issue Mar 17, 2022 · 5 comments
Open

MACS2 peaks beyond chromosome #1050

t-and opened this issue Mar 17, 2022 · 5 comments
Labels
enhancement New feature or request

Comments

@t-and
Copy link

t-and commented Mar 17, 2022

Hi @timoast,

I am running into an issue I have seen on here, but cannot seem to identify a solution. I am currently trying to analyze an snATAC dataset from zebrafish. While running AddMotifs, I get the error:

> library(BSgenome.Drerio.UCSC.danRer11)
> posit.freq.matrices <- getMatrixSet(x = JASPAR2020, 
+                                     opts = list(collection = "CORE",
+                                                 tax_group = "vertebrates", 
+                                                 all_versions = F))
> seqlevelsStyle(BSgenome.Drerio.UCSC.danRer11) <- 'Ensembl'
> iom.atac <- AddMotifs(object = iom.atac, genome = BSgenome.Drerio.UCSC.danRer11, pfm = posit.freq.matrices)
Error in loadFUN(x, seqname, ranges) : trying to load regions beyond the boundaries of non-circular sequence "chr23"

Based on issue #860, it looks like this is because I have peaks that extend beyond the chromosome.

> gr <- granges(iom.atac[['peaks']])
> end(BSgenome.Drerio.UCSC.danRer11$`23`)
[1] 46223584        1
> max(end(gr[seqnames(gr) == "23"]))
[1] 46223630

Interestingly, I was able to run AddMotifs without error on the peaks from my CellRanger pre-processed data - this error only occurred after I re-did the peak-calling with macs2.

I know you mentioned in #860 that the issue may be that the version of the genome is different between the data and the BSgenome object, however both the BSgenome and data use the same assembly, GRCz11. Are there any other potential causes for this issue, or is there any way around it?

Thank you for this great tool and your support,
Troy

> sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BSgenome_1.58.0             rtracklayer_1.50.0          Biostrings_2.58.0          
 [4] XVector_0.30.0              TFBSTools_1.28.0            JASPAR2020_0.99.10         
 [7] Signac_1.6.0                tidyr_1.2.0                 readr_2.1.2                
[10] magrittr_2.0.2              patchwork_1.1.1             cowplot_1.1.1              
[13] MAST_1.16.0                 SingleCellExperiment_1.12.0 SummarizedExperiment_1.20.0
[16] Biobase_2.50.0              GenomicRanges_1.42.0        GenomeInfoDb_1.26.7        
[19] IRanges_2.24.1              S4Vectors_0.28.1            BiocGenerics_0.36.1        
[22] MatrixGenerics_1.2.1        matrixStats_0.61.0          sctransform_0.3.3          
[25] ggplot2_3.3.5               Matrix_1.3-4                dplyr_1.0.8                
[28] SeuratObject_4.0.4          Seurat_4.1.0               

loaded via a namespace (and not attached):
  [1] utf8_1.2.2                  reticulate_1.24             R.utils_2.11.0             
  [4] tidyselect_1.1.2            poweRlaw_0.70.6             RSQLite_2.2.10             
  [7] AnnotationDbi_1.52.0        htmlwidgets_1.5.4           grid_4.0.5                 
 [10] docopt_0.7.1                BiocParallel_1.24.1         Rtsne_0.15                 
 [13] munsell_0.5.0               codetools_0.2-18            ica_1.0-2                  
 [16] future_1.24.0               miniUI_0.1.1.1              withr_2.5.0                
 [19] spatstat.random_2.1-0       colorspace_2.0-3            knitr_1.37                 
 [22] rstudioapi_0.13             ROCR_1.0-11                 tensor_1.5                 
 [25] listenv_0.8.0               slam_0.1-50                 GenomeInfoDbData_1.2.4     
 [28] polyclip_1.10-0             bit64_4.0.5                 farver_2.1.0               
 [31] parallelly_1.30.0           vctrs_0.3.8                 generics_0.1.2             
 [34] xfun_0.30                   lsa_0.73.2                  ggseqlogo_0.1              
 [37] R6_2.5.1                    bitops_1.0-7                spatstat.utils_2.3-0       
 [40] cachem_1.0.6                DelayedArray_0.16.3         assertthat_0.2.1           
 [43] promises_1.2.0.1            scales_1.1.1                gtable_0.3.0               
 [46] globals_0.14.0              goftest_1.2-3               seqLogo_1.56.0             
 [49] rlang_1.0.2                 RcppRoll_0.3.0              splines_4.0.5              
 [52] lazyeval_0.2.2              spatstat.geom_2.3-2         reshape2_1.4.4             
 [55] abind_1.4-5                 httpuv_1.6.5                tools_4.0.5                
 [58] ellipsis_0.3.2              spatstat.core_2.4-0         RColorBrewer_1.1-2         
 [61] ggridges_0.5.3              Rcpp_1.0.7                  plyr_1.8.6                 
 [64] zlibbioc_1.36.0             purrr_0.3.4                 RCurl_1.98-1.6             
 [67] rpart_4.1-15                deldir_1.0-6                pbapply_1.5-0              
 [70] zoo_1.8-9                   ggrepel_0.9.1               cluster_2.1.1              
 [73] motifmatchr_1.12.0          data.table_1.14.2           scattermore_0.8            
 [76] lmtest_0.9-39               RANN_2.6.1                  SnowballC_0.7.0            
 [79] fitdistrplus_1.1-6          hms_1.1.1                   mime_0.12                  
 [82] xtable_1.8-4                XML_3.99-0.9                sparsesvd_0.2              
 [85] gridExtra_2.3               compiler_4.0.5              tibble_3.1.6               
 [88] KernSmooth_2.23-18          crayon_1.5.0                R.oo_1.24.0                
 [91] htmltools_0.5.2             mgcv_1.8-34                 later_1.3.0                
 [94] tzdb_0.2.0                  DBI_1.1.2                   tweenr_1.0.2               
 [97] MASS_7.3-53.1               cli_3.2.0                   R.methodsS3_1.8.1          
[100] igraph_1.2.11               pkgconfig_2.0.3             GenomicAlignments_1.26.0   
[103] TFMPvalue_0.0.8             plotly_4.10.0               spatstat.sparse_2.1-0      
[106] annotate_1.68.0             DirichletMultinomial_1.32.0 stringr_1.4.0              
[109] digest_0.6.29               RcppAnnoy_0.0.19            pracma_2.3.8               
[112] CNEr_1.26.0                 spatstat.data_2.1-2         leiden_0.3.9               
[115] fastmatch_1.1-3             uwot_0.1.11                 shiny_1.7.1                
[118] Rsamtools_2.6.0             gtools_3.9.2                lifecycle_1.0.1            
[121] nlme_3.1-152                jsonlite_1.8.0              viridisLite_0.4.0          
[124] fansi_1.0.2                 pillar_1.7.0                lattice_0.20-41            
[127] KEGGREST_1.30.1             fastmap_1.1.0               httr_1.4.2                 
[130] survival_3.2-10             GO.db_3.12.1                glue_1.6.2                 
[133] qlcMatrix_0.9.7             png_0.1-7                   bit_4.0.4                  
[136] ggforce_0.3.3               stringi_1.7.6               blob_1.2.2                 
[139] caTools_1.18.2              memoise_2.0.1               irlba_2.3.5                
[142] future.apply_1.8.1    
@t-and
Copy link
Author

t-and commented Mar 17, 2022

In looking at the CellRanger peak calling, it looks like there is a peak called at the very end of chr23. Is it possible that the CallPeaks function might have extended this peak with the default ext.size = 200 such that it would extend beyond the boundary of the chromosome? If so, is there any way to remove this peak (and any others extending beyond their chromosome) so it does not interfere with downstream analysis?

@t-and
Copy link
Author

t-and commented Mar 18, 2022

So macs2 did in fact extend a peak at the end of the chromosome beyond the chromosome, and removing that single peak by taking the CallPeaks() output and setting that row to NULL worked to resolve the issue. This might be an enhancement suggestion, in case there is any way to include a limitation in the CallPeaks() function to clip any peaks that would extend beyond the annotated chromosome.

@timoast
Copy link
Collaborator

timoast commented Mar 18, 2022

@t-and thanks for looking into it, I agree it would be good to have an automatic way of preventing this but it's difficult to do since we don't currently require any information about the chromosome lengths when running CallPeaks(). We could add an optional parameter to supply this in the future and clip peaks that fall outside chromosome boundaries.

@timoast timoast added the enhancement New feature or request label Mar 18, 2022
@timoast timoast changed the title Peaks beyond chromosome MACS2 peaks beyond chromosome Mar 18, 2022
@t-and
Copy link
Author

t-and commented Mar 18, 2022

The optional parameter would definitely be helpful, since I'd think most users would likely have some sort of BSgenome or other genome attached to use as a reference for where the chromosome should end. Thanks for being so attentive to the issues posted on here! If it's okay, I am going to close this issue now that the actual error is resolved.

@t-and t-and closed this as completed Mar 18, 2022
@timoast
Copy link
Collaborator

timoast commented Mar 18, 2022

I'll just leave this open until we have something added to Signac to handle this properly

@timoast timoast reopened this Mar 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Future
Development

No branches or pull requests

2 participants