Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] stretch() not stretching $blocks in GRanges #85

Open
AngelCampos opened this issue Sep 29, 2020 · 6 comments
Open

[BUG] stretch() not stretching $blocks in GRanges #85

AngelCampos opened this issue Sep 29, 2020 · 6 comments

Comments

@AngelCampos
Copy link

AngelCampos commented Sep 29, 2020

Hello Stuart.

I want to use the stretch() function to extend the 5' UTR and 3'UTR ends of a GRanges object as loaded by using the plyranges::read_bed() function, but stretch() is not extending the $blocks IRanges contained in the GRanges.

Following is a reproducible example using the in-package data.

stretch() not stretching gr$blocks

Setup

library(plyranges)
library(GenomicRanges)
library(magrittr)
require(rtracklayer)

Reproducible example

Loading a bed file using read_bed()

test_path <- system.file("tests", package = "rtracklayer")
bed_file <- file.path(test_path, "test.bed")
gr <- read_bed(bed_file)

Extending 3 prime end works fine for GRanges main start and end
coordinates.

newGR <- plyranges::stretch(plyranges::anchor_5p(gr), 10)

Total width is extended

width(gr)
## [1] 1167 1167 1167 1167 1167
width(newGR)
## [1] 1177 1177 1177 1177 1177

But blocks size remain the same

gr$blocks %>% lapply(width) %>% lapply(sum) %>% unlist
## [1]  600  750 1167 1167 1167
newGR$blocks %>% lapply(width) %>% lapply(sum) %>% unlist
## [1]  600  750 1167 1167 1167

Session info

options(width = 120)
sessioninfo::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 4.0.0 (2020-04-24)
##  os       CentOS Linux 7 (Core)       
##  system   x86_64, linux-gnu           
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  ctype    en_US.UTF-8                 
##  tz       Asia/Jerusalem              
##  date     2020-09-29                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package              * version  date       lib source        
##  assertthat             0.2.1    2019-03-21 [2] CRAN (R 4.0.0)
##  Biobase                2.48.0   2020-04-27 [2] Bioconductor  
##  BiocGenerics         * 0.34.0   2020-04-27 [2] Bioconductor  
##  BiocParallel           1.22.0   2020-04-27 [2] Bioconductor  
##  Biostrings             2.56.0   2020-04-27 [2] Bioconductor  
##  bitops                 1.0-6    2013-08-17 [2] CRAN (R 4.0.0)
##  cli                    2.0.2    2020-02-28 [2] CRAN (R 4.0.0)
##  crayon                 1.3.4    2017-09-16 [2] CRAN (R 4.0.0)
##  DelayedArray           0.14.1   2020-07-14 [2] Bioconductor  
##  digest                 0.6.25   2020-02-23 [2] CRAN (R 4.0.0)
##  dplyr                  1.0.2    2020-08-18 [2] CRAN (R 4.0.0)
##  ellipsis               0.3.1    2020-05-15 [2] CRAN (R 4.0.0)
##  evaluate               0.14     2019-05-28 [2] CRAN (R 4.0.0)
##  fansi                  0.4.1    2020-01-08 [2] CRAN (R 4.0.0)
##  generics               0.0.2    2018-11-29 [2] CRAN (R 4.0.0)
##  GenomeInfoDb         * 1.24.2   2020-06-15 [2] Bioconductor  
##  GenomeInfoDbData       1.2.3    2020-05-03 [2] Bioconductor  
##  GenomicAlignments      1.24.0   2020-04-27 [2] Bioconductor  
##  GenomicRanges        * 1.40.0   2020-04-27 [2] Bioconductor  
##  glue                   1.4.2    2020-08-27 [1] CRAN (R 4.0.0)
##  htmltools              0.5.0    2020-06-16 [2] CRAN (R 4.0.0)
##  IRanges              * 2.22.2   2020-05-21 [2] Bioconductor  
##  knitr                  1.29     2020-06-23 [2] CRAN (R 4.0.0)
##  lattice                0.20-41  2020-04-02 [2] CRAN (R 4.0.0)
##  lifecycle              0.2.0    2020-03-06 [2] CRAN (R 4.0.0)
##  magrittr             * 1.5      2014-11-22 [2] CRAN (R 4.0.0)
##  Matrix                 1.2-18   2019-11-27 [2] CRAN (R 4.0.0)
##  matrixStats            0.57.0   2020-09-25 [2] CRAN (R 4.0.0)
##  pillar                 1.4.6    2020-07-10 [2] CRAN (R 4.0.0)
##  pkgconfig              2.0.3    2019-09-22 [2] CRAN (R 4.0.0)
##  plyranges            * 1.8.0    2020-04-27 [1] Bioconductor  
##  purrr                  0.3.4    2020-04-17 [2] CRAN (R 4.0.0)
##  R6                     2.4.1    2019-11-12 [2] CRAN (R 4.0.0)
##  RCurl                  1.98-1.2 2020-04-18 [2] CRAN (R 4.0.0)
##  rlang                  0.4.7    2020-07-09 [2] CRAN (R 4.0.0)
##  rmarkdown              2.3      2020-06-18 [2] CRAN (R 4.0.0)
##  Rsamtools              2.4.0    2020-04-27 [2] Bioconductor  
##  rtracklayer          * 1.48.0   2020-04-27 [2] Bioconductor  
##  S4Vectors            * 0.26.1   2020-05-16 [2] Bioconductor  
##  sessioninfo            1.1.1    2018-11-05 [2] CRAN (R 4.0.0)
##  stringi                1.5.3    2020-09-09 [1] CRAN (R 4.0.0)
##  stringr                1.4.0    2019-02-10 [2] CRAN (R 4.0.0)
##  SummarizedExperiment   1.18.2   2020-07-09 [2] Bioconductor  
##  tibble                 3.0.3    2020-07-10 [2] CRAN (R 4.0.0)
##  tidyselect             1.1.0    2020-05-11 [2] CRAN (R 4.0.0)
##  vctrs                  0.3.4    2020-08-29 [1] CRAN (R 4.0.0)
##  withr                  2.3.0    2020-09-22 [2] CRAN (R 4.0.0)
##  xfun                   0.17     2020-09-09 [2] CRAN (R 4.0.0)
##  XML                    3.99-0.5 2020-07-23 [2] CRAN (R 4.0.0)
##  XVector                0.28.0   2020-04-27 [2] Bioconductor  
##  yaml                   2.2.1    2020-02-01 [2] CRAN (R 4.0.0)
##  zlibbioc               1.34.0   2020-04-27 [2] Bioconductor  
## 

--

Is there a function I'm missing that does what I am expecting (blocks size update)?
Thanks for the help
Best
Miguel

@sa-lee
Copy link
Collaborator

sa-lee commented Sep 30, 2020

Hi Miguel,

So stretch() has no knowledge that there's an additional column called blocks in your GRanges.

How are you expecting blocks to be updated? Are you expecting the total width of the block to be extended by 10?

@sa-lee
Copy link
Collaborator

sa-lee commented Sep 30, 2020

Here's a few ways you could do this:

suppressPackageStartupMessages(library(plyranges))

test_path <- system.file("tests", package = "rtracklayer")
bed_file <- file.path(test_path, "test.bed")
gr <- read_bed(bed_file)

newGR <- stretch(anchor_5p(gr), 10)

# add stretch amount to each element by the number of blocks, 
# naive if not completely divisible (off by one)
ans <- newGR %>% 
  mutate(blocks = resize(blocks, width(blocks) + 10 / lengths(blocks))) 

ans
#> GRanges object with 5 ranges and 5 metadata columns:
#>       seqnames              ranges strand |        name     score     itemRgb
#>          <Rle>           <IRanges>  <Rle> | <character> <numeric> <character>
#>   [1]     chr7 127471197-127472373      + |        Pos1         0     #FF0000
#>   [2]     chr7 127472364-127473540      + |        Pos2         2     #FF0000
#>   [3]     chr7 127473521-127474697      - |        Neg1         0     #FF0000
#>   [4]     chr9 127474698-127475874      + |        Pos3         5     #FF0000
#>   [5]     chr9 127475855-127477031      - |        Neg2         5     #0000FF
#>                     thick                  blocks
#>                 <IRanges>           <IRangesList>
#>   [1] 127471197-127472363 1-303,501-703,1068-1170
#>   [2] 127472364-127473530          1-255,668-1172
#>   [3] 127473531-127474697                  1-1177
#>   [4] 127474698-127475864                  1-1177
#>   [5] 127475865-127477031                  1-1177
#>   -------
#>   seqinfo: 2 sequences from an unspecified genome; no seqlengths
width(ans)
#> [1] 1177 1177 1177 1177 1177
sum(width(ans$blocks))
#> [1]  609  760 1177 1177 1177

even_block_strech <- function(blocks, extend) {
    part <- PartitioningByEnd(blocks)
    lens <- width(part)
    div <- extend %/% lens
    rem <- extend %% lens
    extends <- rep(div, times = lens)
    extends[end(part)] <- extends[end(part)] + rem
    # unlist and resize
    collapsed <- unlist(blocks)
    collapsed <- resize(collapsed,  width(collapsed) + extends)
    relist(collapsed, part)
}

ans <- newGR %>% 
  mutate(blocks = even_block_strech(blocks, 10))

ans
#> GRanges object with 5 ranges and 5 metadata columns:
#>       seqnames              ranges strand |        name     score     itemRgb
#>          <Rle>           <IRanges>  <Rle> | <character> <numeric> <character>
#>   [1]     chr7 127471197-127472373      + |        Pos1         0     #FF0000
#>   [2]     chr7 127472364-127473540      + |        Pos2         2     #FF0000
#>   [3]     chr7 127473521-127474697      - |        Neg1         0     #FF0000
#>   [4]     chr9 127474698-127475874      + |        Pos3         5     #FF0000
#>   [5]     chr9 127475855-127477031      - |        Neg2         5     #0000FF
#>                     thick                  blocks
#>                 <IRanges>           <IRangesList>
#>   [1] 127471197-127472363 1-303,501-703,1068-1171
#>   [2] 127472364-127473530          1-255,668-1172
#>   [3] 127473531-127474697                  1-1177
#>   [4] 127474698-127475864                  1-1177
#>   [5] 127475865-127477031                  1-1177
#>   -------
#>   seqinfo: 2 sequences from an unspecified genome; no seqlengths
width(ans)
#> [1] 1177 1177 1177 1177 1177
sum(width(ans$blocks))
#> [1]  610  760 1177 1177 1177

last_block_stretch <- function(blocks, extend) {
  part <- PartitioningByEnd(blocks)
  zeros <- rep_len(0L, sum(width(part)))
  zeros[end(part)] <- extend
  # unlist and resize
  collapsed <- unlist(blocks)
  collapsed <- resize(collapsed,  width(collapsed) + zeros)
  relist(collapsed, part)
}

ans <- newGR %>% 
  mutate(blocks = last_block_strech(blocks, 10))
#> Error in last_block_strech(blocks, 10): could not find function "last_block_strech"

ans
#> GRanges object with 5 ranges and 5 metadata columns:
#>       seqnames              ranges strand |        name     score     itemRgb
#>          <Rle>           <IRanges>  <Rle> | <character> <numeric> <character>
#>   [1]     chr7 127471197-127472373      + |        Pos1         0     #FF0000
#>   [2]     chr7 127472364-127473540      + |        Pos2         2     #FF0000
#>   [3]     chr7 127473521-127474697      - |        Neg1         0     #FF0000
#>   [4]     chr9 127474698-127475874      + |        Pos3         5     #FF0000
#>   [5]     chr9 127475855-127477031      - |        Neg2         5     #0000FF
#>                     thick                  blocks
#>                 <IRanges>           <IRangesList>
#>   [1] 127471197-127472363 1-303,501-703,1068-1171
#>   [2] 127472364-127473530          1-255,668-1172
#>   [3] 127473531-127474697                  1-1177
#>   [4] 127474698-127475864                  1-1177
#>   [5] 127475865-127477031                  1-1177
#>   -------
#>   seqinfo: 2 sequences from an unspecified genome; no seqlengths
width(ans)
#> [1] 1177 1177 1177 1177 1177
sum(width(ans$blocks))
#> [1]  610  760 1177 1177 1177

Created on 2020-09-30 by the reprex package (v0.3.0)

@sa-lee
Copy link
Collaborator

sa-lee commented Sep 30, 2020

Let me know if that's what you had in mind, and I'll close the issue.

@AngelCampos
Copy link
Author

AngelCampos commented Sep 30, 2020

Sorry I didn't explain the expected outcome.

I'm not expecting an even stretch, nor the last block without considering the GRange strand. I was expecting only the blocks containing the 3' UTR to be extended. For the case of positive-strand GRanges that would be the last block, while for negative-stranded it will be the first. The complement function would be to extend the 5' UTR, in which the opposite ending blocks are extended along with the GRanges.

If $blocks awareness is not expected to be added to stretch() in future plyranges versions, I think your answer will already help me to do what I want. It's ok if you close the issue 👍

Thanks for your speedy follow-up.

@sa-lee
Copy link
Collaborator

sa-lee commented Sep 30, 2020

Ok that makes way more sense! So I think if this is urgent, you could modify the last_block_stretch function I made above. Thinking about an interface to this, the easiest way would probably be to modify stretch to take a new arg called block which will be a bare variable referencing the list colum. This way we could still use the anchoring concept:

# extend ranges in gr and the block column by 5' UTR
gr %>%
  anchor_5p() %>%
  stretch(block = block, extend = 10)

# extend ranges in gr and the block column by 3' UTR
gr %>%
  anchor_3p() %>%
  stretch(block = block, extend = 10)

# only stretch the ranges in gr, default will be block = NULL
gr %>%
  stretch(extend = 10)

What do you think?

@AngelCampos
Copy link
Author

Sounds good, although not sure if it will be general enough, especially without considering the use of the anchor_3p() or anchor_5p() functions. If anchoring is not considered, even stretching of blocks will most probably give unexpected outcomes. Perhaps a specialized function will work better, in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants