Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in getListElement(x, i, ...) : GRanges objects don't support [[, as.list(), lapply(), or unlist() at the moment #76

Open
AntoniaChroni opened this issue Feb 28, 2023 · 17 comments

Comments

@AntoniaChroni
Copy link

Hey,

I am getting this error when I am using reduce function on a GRangesList (it was working fine a few days ago).

Error in getListElement(x, i, ...) :
GRanges objects don't support [[, as.list(), lapply(), or unlist() at
the moment

My script is:
peaks = GRangesList()
combined.peaks <- reduce(c(peaks$sample1, peaks$sample2, peaks$sample3))

And the output looks like this:

peaks
GRangesList object of length 11:
$sample1
GRanges object with 98860 ranges and 0 metadata columns:
seqnames ranges strand

[1] chr1 629770-630550 *
[2] chr1 633794-634259 *
[3] chr1 778203-779262 *
[4] chr1 817083-817625 *
[5] chr1 821110-821501 *
... ... ... ...
[98856] chrY 56868863-56871397 *
[98857] chrY 56872276-56872281 *
[98858] chrY 56873716-56874522 *
[98859] chrY 56877566-56877837 *
[98860] chrY 56879680-56880015 *


seqinfo: 24 sequences from an unspecified genome; no seqlengths

...
<10 more elements>

Any thoughts on what is happening?

Thanks!

@hpages
Copy link
Contributor

hpages commented Feb 28, 2023

Hi @AntoniaChroni,

My script is:
peaks = GRangesList()
combined.peaks <- reduce(c(peaks$sample1, peaks$sample2, peaks$sample3))

Is this really your script? It doesn't make much sense because with your code peaks is an empty GRangesList object with no sample1, or sample2, or sample3 list element. So it is expected to fail, but not with the same error as the one you got:

> peaks = GRangesList()
> combined.peaks <- reduce(c(peaks$sample1, peaks$sample2, peaks$sample3))
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘reduce’ for signature ‘"NULL"’

Please provide:

  1. A self-contained reproducible example that actually produces the error you got.
  2. Your sessionInfo().

Thanks,
H.

@AntoniaChroni
Copy link
Author

AntoniaChroni commented Feb 28, 2023

This is the whole script:

inputFiles <- c("./sample1/outs/fragments.tsv.gz",
"./sample2/outs/fragments.tsv.gz", "./sample3/outs/fragments.tsv.gz")

names(inputFiles) = c("sample1", "sample2", "sample3")

matrixFiles = gsub(inputFiles,pattern = "fragments.tsv.gz", replacement = "")
matrixFiles

barcodes = list()
peaks = GRangesList()

for(i in names(matrixFiles)){
barcodes[[i]] = as.data.frame(readr::read_tsv(file = paste0(matrixFiles[i],"filtered_peak_bc_matrix/barcodes.tsv")), col_names = F)[,1]
p = as.data.frame(readr::read_tsv(file = paste0(matrixFiles[i],"filtered_peak_bc_matrix/peaks.bed")), col_names = F)[,1:3]
colnames(p) = c("chr","start","end")
peaks[[i]] = makeGRangesFromDataFrame(df = p) # keep.extra.columns = T - this adds colmns related to annotation, percentGC, percentAT etc
}

combined.peaks <- reduce(c(peaks$sample1, peaks$sample2, peaks$sample3))

@hpages
Copy link
Contributor

hpages commented Feb 28, 2023

Please provide a self-contained reproducible example. The code you share with us is not self-contained because we don't have access to all those files that you are using, so we can't run it. And if we can't run it we can't reproduce the problem. And if we can't reproduce the problem then we can't fix it. 🤷‍♂️

Also you're still not providing your sessionInfo().

@AntoniaChroni
Copy link
Author

is there an email that I could share this data with you?

@AntoniaChroni
Copy link
Author

sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux

Matrix products: default
BLAS: /gpfs/share/apps/R/4.2.2/lib64/R/lib/libRblas.so
LAPACK: /gpfs/share/apps/R/4.2.2/lib64/R/lib/libRlapack.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base

other attached packages:
[1] RColorBrewer_1.1-3 ggrepel_0.9.3
[3] harmony_0.1.1 Rcpp_1.0.10
[5] cowplot_1.1.1 clustree_0.5.0
[7] ggraph_2.1.0 patchwork_1.1.2
[9] lubridate_1.9.2 forcats_1.0.0
[11] stringr_1.5.0 purrr_1.0.1
[13] tidyr_1.3.0 tibble_3.1.8
[15] tidyverse_2.0.0 EnsDb.Hsapiens.v75_2.99.0
[17] JASPAR2020_0.99.10 TFBSTools_1.36.0
[19] dplyr_1.1.0 BSgenome.Hsapiens.UCSC.hg38_1.4.5
[21] BSgenome_1.66.3 rtracklayer_1.58.0
[23] Biostrings_2.66.0 XVector_0.38.0
[25] EnsDb.Hsapiens.v86_2.99.0 ensembldb_2.22.0
[27] AnnotationFilter_1.22.0 GenomicFeatures_1.50.4
[29] AnnotationDbi_1.60.0 Biobase_2.58.0
[31] GenomicRanges_1.50.2 GenomeInfoDb_1.34.9
[33] IRanges_2.32.0 S4Vectors_0.36.1
[35] BiocGenerics_0.44.0 readr_2.1.4
[37] Matrix_1.5-3 reshape2_1.4.4
[39] ggplot2_3.4.1 SeuratObject_4.1.3
[41] Seurat_4.3.0 Signac_1.9.0

loaded via a namespace (and not attached):
[1] rappdirs_0.3.3 scattermore_0.8
[3] R.methodsS3_1.8.2 knitr_1.42
[5] bit64_4.0.5 irlba_2.3.5.1
[7] DelayedArray_0.24.0 R.utils_2.12.2
[9] data.table_1.14.8 rpart_4.1.19
[11] KEGGREST_1.38.0 RCurl_1.98-1.10
[13] generics_0.1.3 RSQLite_2.3.0
[15] RANN_2.6.1 future_1.31.0
[17] bit_4.0.5 tzdb_0.3.0
[19] spatstat.data_3.0-0 xml2_1.3.3
[21] httpuv_1.6.9 SummarizedExperiment_1.28.0
[23] assertthat_0.2.1 DirichletMultinomial_1.40.0
[25] viridis_0.6.2 xfun_0.37
[27] hms_1.1.2 promises_1.2.0.1
[29] fansi_1.0.4 restfulr_0.0.15
[31] progress_1.2.2 caTools_1.18.2
[33] dbplyr_2.3.0 igraph_1.4.1
[35] DBI_1.1.3 htmlwidgets_1.6.1
[37] spatstat.geom_3.0-6 ellipsis_0.3.2
[39] backports_1.4.1 annotate_1.76.0
[41] biomaRt_2.54.0 deldir_1.0-6
[43] MatrixGenerics_1.10.0 vctrs_0.5.2
[45] ROCR_1.0-11 abind_1.4-5
[47] cachem_1.0.7 withr_2.5.0
[49] ggforce_0.4.1 progressr_0.13.0
[51] vroom_1.6.1 checkmate_2.1.0
[53] sctransform_0.3.5 GenomicAlignments_1.34.0
[55] prettyunits_1.1.1 goftest_1.2-3
[57] cluster_2.1.4 lazyeval_0.2.2
[59] seqLogo_1.64.0 crayon_1.5.2
[61] spatstat.explore_3.0-6 pkgconfig_2.0.3
[63] tweenr_2.0.2 nlme_3.1-162
[65] ProtGenerics_1.30.0 nnet_7.3-18
[67] rlang_1.0.6 globals_0.16.2
[69] lifecycle_1.0.3 miniUI_0.1.1.1
[71] filelock_1.0.2 BiocFileCache_2.6.1
[73] dichromat_2.0-0.1 polyclip_1.10-4
[75] matrixStats_0.63.0 lmtest_0.9-40
[77] zoo_1.8-11 base64enc_0.1-3
[79] ggridges_0.5.4 png_0.1-8
[81] viridisLite_0.4.1 rjson_0.2.21
[83] bitops_1.0-7 R.oo_1.25.0
[85] KernSmooth_2.23-20 blob_1.2.3
[87] parallelly_1.34.0 spatstat.random_3.1-3
[89] jpeg_0.1-10 CNEr_1.34.0
[91] scales_1.2.1 memoise_2.0.1
[93] magrittr_2.0.3 plyr_1.8.8
[95] ica_1.0-3 zlibbioc_1.44.0
[97] compiler_4.2.2 BiocIO_1.8.0
[99] fitdistrplus_1.1-8 Rsamtools_2.14.0
[101] cli_3.6.0 listenv_0.9.0
[103] pbapply_1.7-0 htmlTable_2.4.1
[105] Formula_1.2-5 MASS_7.3-58.2
[107] tidyselect_1.2.0 stringi_1.7.12
[109] yaml_2.3.7 latticeExtra_0.6-30
[111] grid_4.2.2 VariantAnnotation_1.44.1
[113] fastmatch_1.1-3 tools_4.2.2
[115] timechange_0.2.0 future.apply_1.10.0
[117] rstudioapi_0.14 TFMPvalue_0.0.9
[119] foreign_0.8-84 gridExtra_2.3
[121] farver_2.1.1 Rtsne_0.16
[123] digest_0.6.31 shiny_1.7.4
[125] pracma_2.4.2 later_1.3.0
[127] RcppAnnoy_0.0.20 httr_1.4.5
[129] biovizBase_1.46.0 colorspace_2.1-0
[131] XML_3.99-0.13 tensor_1.5
[133] reticulate_1.28 splines_4.2.2
[135] uwot_0.1.14 RcppRoll_0.3.0
[137] spatstat.utils_3.0-1 graphlayouts_0.8.4
[139] sp_1.6-0 plotly_4.10.1
[141] xtable_1.8-4 jsonlite_1.8.4
[143] poweRlaw_0.70.6 tidygraph_1.2.3
[145] R6_2.5.1 Hmisc_4.8-0
[147] pillar_1.8.1 htmltools_0.5.4
[149] mime_0.12 glue_1.6.2
[151] fastmap_1.1.1 BiocParallel_1.32.5
[153] codetools_0.2-19 utf8_1.2.3
[155] lattice_0.20-45 spatstat.sparse_3.0-0
[157] curl_5.0.0 leiden_0.4.3
[159] gtools_3.9.4 GO.db_3.16.0
[161] interp_1.1-3 survival_3.5-3
[163] munsell_0.5.0 GenomeInfoDbData_1.2.9
[165] gtable_0.3.1

@hpages
Copy link
Contributor

hpages commented Feb 28, 2023

I don't know what files exactly you want to send me by email but your code above uses 9 possibly big files. Indiscriminately sending me all that stuff by email and let me figure out what I really need to reproduce the problem is not the way to go. You need to figure out what's strictly needed i.e. it's on you to try to reduce the size of the requirements as much as possible.

This usually starts by trying to simplify your example as much as possible to isolate the code that reproduces the problem, and get rid of anything else. For example, we don't care about the 3 fragments.tsv.gz or 3 barcodes.tsv files, because they are not used at all to construct GRangesList object peaks. The only thing you need in order to construct this object are the peaks.bed files. So the first thing you want to do is simplify your reproducible example like this:

bedFiles <- c("./sample1/outs/filtered_peak_bc_matrix/peaks.bed",
              "./sample2/outs/filtered_peak_bc_matrix/peaks.bed",
              "./sample3/outs/filtered_peak_bc_matrix/peaks.bed")
names(bedFiles) = c("sample1", "sample2", "sample3")

peaks = GRangesList()

for(i in names(bedFiles)){
    p = as.data.frame(readr::read_tsv(file = bedFiles[i]), col_names = F)[,1:3]
    colnames(p) = c("chr","start","end")
    peaks[[i]] = makeGRangesFromDataFrame(df = p) # keep.extra.columns = T - this adds colmns related to annotation, percentGC, percentAT etc
}

combined.peaks <- reduce(c(peaks$sample1, peaks$sample2, peaks$sample3))

BTW you also want to make the effort to indent your code properly to make it easier for others to read and understand it.

Also now that the code is smaller, we can actually start to focus on it and pay attention to details. FWIW I notice that the col_names argument (which is an argument of the readr::read_tsv() function) is used in the call to as.data.frame() instead of the call to readr::read_tsv(). This is because of how the parentheses are placed:

as.data.frame(readr::read_tsv(file = bedFiles[i]), col_names = F)

when they should probably be:

as.data.frame(readr::read_tsv(file = bedFiles[i], col_names = F))

So before you share the peaks.bed data with us, can you run this code and confirm that it still reproduces the problem?Also can you share here what you get when you dispay peaks$sample1, peaks$sample2, and peaks$sample3? Thanks!

@AntoniaChroni
Copy link
Author

So, I have done all these and still have the same error. The code was working fine before.
BTW when I do:
lapply(peaks,length)
$sample1
[1] 98860

$sample2
[1] 76862

That returns a list of peaks per sample.

I am sorry for the confusion with the code and session info, but I was not aware of it.

@hpages
Copy link
Contributor

hpages commented Feb 28, 2023

What about sharing what you get when you display peaks$sample1, peaks$sample2, and peaks$sample3?

Also what do you think of the misplaced parentheses in as.data.frame(readr::read_tsv(file = bedFiles[i]), col_names = F). You happy with that? It does what you want? Have you tried to re-run the code after fixing the parentheses? Does it make a difference?

@AntoniaChroni
Copy link
Author

AntoniaChroni commented Feb 28, 2023

changing the parentheses is returning NULL for additional samples. If I keep the parentheses as they are:

peaks$sample1

GRanges object with 98860 ranges and 0 metadata columns:

      seqnames            ranges strand
         <Rle>         <IRanges>  <Rle>
  [1]     chr1     629770-630550      *
  [2]     chr1     633794-634259      *
  [3]     chr1     778203-779262      *
  [4]     chr1     817083-817625      *
  [5]     chr1     821110-821501      *
  ...      ...               ...    ...

[98856]     chrY 56868863-56871397      *
[98857]     chrY 56872276-56872281      *

[98858]     chrY 56873716-56874522      *

[98859]   chrY 56877566-56877837      *

[98860]     chrY 56879680-56880015      *

-------
seqinfo: 24 sequences from an unspecified genome; no seqlengths

peaks$sample2

GRanges object with 76862 ranges and 0 metadata columns:

      seqnames            ranges strand
         <Rle>         <IRanges>  <Rle>
  [1]     chr1     629766-630143      *
  [2]     chr1     633792-634267      *
  [3]     chr1     778232-779166      *
  [4]     chr1     817123-817561      *
  [5]     chr1     820927-821538      *
  ...      ...               ...    ...


[76858]     chrY 56866489-56866827      *
[76859]     chrY 56868795-56872323      *
[76860]     chrY 56872875-56874168      *
[76861]     chrY 56877550-56877852      *
[76862]     chrY 56879622-56880373      *

-------
seqinfo: 24 sequences from an unspecified genome; no seqlengths

@hpages
Copy link
Contributor

hpages commented Mar 1, 2023

What about peaks$sample3? Why aren't you showing it? Note that it's the 3rd time I ask you to show it. You're making it really difficult to help you.

@hpages
Copy link
Contributor

hpages commented Mar 1, 2023

Also please show what you get when doing:

c(peaks$sample1, peaks$sample2, peaks$sample3)

BTW it wouldn't hurt that you take some time to learn the basics of how to properly format your posts here on GitHub e.g. any code or output produced by the code should be placed between triple backticks lines (```).

@AntoniaChroni
Copy link
Author

I wanted to keep it simple, so I only included 2 samples when following your suggestions when using reduce function.

@hpages
Copy link
Contributor

hpages commented Mar 1, 2023

And you still get the error when doing reduce(c(peaks$sample1, peaks$sample2))? Can you please do that now and show what you get?

@AntoniaChroni
Copy link
Author

AntoniaChroni commented Mar 1, 2023

reduce(c(peaks$sample1, peaks$sample2))

Error in getListElement(x, i, ...) :

 GRanges objects don't support [[, as.list(), lapply(), or unlist() at
 the moment

@hpages
Copy link
Contributor

hpages commented Mar 1, 2023

Formatting is still not good. I wrote "between triple backticks lines". This means that you must put a line with triple backticks before and after the code or the output of the code (so that's 2 lines with triple backticks). Also you can press Preview to see how things get formatted before you actually post. It's free so do not hesitate to use it.

Ok so now we've simplified your original reproducible example to this:

bedFiles <- c("./sample1/outs/filtered_peak_bc_matrix/peaks.bed",
              "./sample2/outs/filtered_peak_bc_matrix/peaks.bed")
names(bedFiles) = c("sample1", "sample2")

peaks = GRangesList()

for(i in names(bedFiles)){
    p = as.data.frame(readr::read_tsv(file = bedFiles[i]), col_names = F)[,1:3]
    colnames(p) = c("chr","start","end")
    peaks[[i]] = makeGRangesFromDataFrame(df = p) # keep.extra.columns = T - this adds colmns related to annotation, percentGC, percentAT etc
}

combined.peaks <- reduce(c(peaks$sample1, peaks$sample2))

Again, can you show what you get when doing c(peaks$sample1, peaks$sample2), as previously asked already?

@AntoniaChroni
Copy link
Author

hey, this hasn't been resolved yet. I woudl appreciate your help on this.
So, this is what I am getting:

c(peaks$sample1, peaks$sample2)

GRanges object with 175722 ranges and 0 metadata columns:
           seqnames            ranges strand
              <Rle>         <IRanges>  <Rle>
       [1]     chr1     629770-630550      *
       [2]     chr1     633794-634259      *
       [3]     chr1     778203-779262      *
       [4]     chr1     817083-817625      *
       [5]     chr1     821110-821501      *
       ...      ...               ...    ...
  [175718]     chrY 56866489-56866827      *
  [175719]     chrY 56868795-56872323      *
  [175720]     chrY 56872875-56874168      *
  [175721]     chrY 56877550-56877852      *
  [175722]     chrY 56879622-56880373      *
  -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths

@hpages
Copy link
Contributor

hpages commented Mar 23, 2023

At this point I will ask you to make your 2 bed files (sample1/outs/filtered_peak_bc_matrix/peaks.bed and sample2/outs/filtered_peak_bc_matrix/peaks.bed) available somewhere and to provide the links in this issue, so anybody can access and download them. You could try to reduce their size (e.g. by keeping only a few hundred lines), that would help.

However you absolutely want to make sure that you can still reproduce the issue using the files that you're going to share with us, and with BioC 3.16. The idea is that me and others can also reproduce the issue by just running the code that I posted in my last comment above using these files. So make sure you can do that too. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants