Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with download button #148

Open
serachoi1230 opened this issue Feb 20, 2023 · 12 comments
Open

Issues with download button #148

serachoi1230 opened this issue Feb 20, 2023 · 12 comments
Assignees
Labels
bug Something isn't working

Comments

@serachoi1230
Copy link
Collaborator

I think there are issues with the download button for upset plots? I get this error:

--- Running overlap_upset_plot() ---
overlap_upset_plot(): Done in 40.8s.
Quitting from lines 338-351 (EpiCompare.Rmd) 
Error in paste(unlist(l), collapse = if (is.null(newline)) "" else newline) : 
  result would exceed 2^31-1 bytes
@serachoi1230 serachoi1230 added the bug Something isn't working label Feb 20, 2023
@bschilder
Copy link
Collaborator

Copied from email correspondence with @serachoi1230

Where did you see the error? (locally, GHA, Bioc servers)

Thank you so much. I saw this error locally. So first I got this error, to do with "download button", for upset plot: #148.
Just as a hot fix, I commented out the "download button" function and everything ran until the tss plot step.

At tss plot step, I got the same error as upset plot, again to do with "download button", so I commented out the "download button" function once again. Then I got the error while generating the html report.

@bschilder
Copy link
Collaborator

Could you please fill out the full Bug Report template @serachoi1230 ? I'm afraid I don't have enough information (what code was run, which versions of deps you have installed, machine type, etc) to replicate the issue.

@bschilder
Copy link
Collaborator

I can't seem to replicate the issue locally, and the GHA checks aren't encountering it. So I this this might have to do with your local setup (e.g. outdated R or system dependencies).

Screenshot 2023-02-24 at 01 20 59

@serachoi1230
Copy link
Collaborator Author

Thanks @bschilder ! I think it's to do with using large list of peak files? I just tried to update my R, and I still ran into the same issue. If it's a problem with my local machine, I won't be able to generate the report. Would would please try generate the report and push to EpiCompare/report? Here is the code:

### ATAC-seq vs. DNase-seq vs. CHiP-seq ###
## ATAC-seq peakfile
atac1_hg38 <- ChIPseeker::readPeakFile("/Users/serachoi/Documents/EpiCompare_extra/peakfiles/ATAC/ENCFF558BLC.bed", as="GRanges")
atac1_hg38_unique <- unique(atac1_hg38)
#qValue from ENCODE is V9, renaming so it can be found
GenomicRanges::mcols(atac1_hg38_unique)$qValue <-GenomicRanges::mcols(atac1_hg38_unique)$V9

## Dnase-seq peakfile
dna1_hg38 <- ChIPseeker::readPeakFile("/Users/serachoi/Documents/EpiCompare_extra/peakfiles/Dnase/ENCFF274YGF.bed", as="GRanges")
#qValue from ENCODE is V9, renaming so it can be found
GenomicRanges::mcols(dna1_hg38)$qValue <-GenomicRanges::mcols(dna1_hg38)$V9

dna2_hg38 <- ChIPseeker::readPeakFile("/Users/serachoi/Documents/EpiCompare_extra/peakfiles/Dnase/ENCFF185XRG.bed", as="GRanges")
#qValue from ENCODE is V9, renaming so it can be found
GenomicRanges::mcols(dna2_hg38)$qValue <- GenomicRanges::mcols(dna2_hg38)$V9

## ChIP-seq peakfile
chip_hg38 <- ChIPseeker::readPeakFile("/Users/serachoi/Documents/EpiCompare_extra/peakfiles/ENCODE/ENCODE_H3K27ac_hg38_ENCFF038DDS.bed", as="GRanges")
#qValue from ENCODE is V9, renaming so it can be found
GenomicRanges::mcols(chip_hg38)$qValue <- GenomicRanges::mcols(chip_hg38)$V9

## Peaklist
peaklist <- list("ATAC_ENCFF558BLC" = atac1_hg38_unique,
                 "Dnase_ENCFF274YGF" = dna1_hg38,
                 "ChIP_ENCFF038DDS" = chip_hg38)

## Reference
reference <- list("Dnase_ENCFF185XRG_reference"=dna2_hg38)

## Blacklist
data("hg38_blacklist")

## Run Epicompare ##
EpiCompare(peakfiles = peaklist,
           genome_build = "hg38",
           genome_build_output = "hg38",
           blacklist = hg38_blacklist,
           reference = reference,
           picard_files = NULL,
           upset_plot = T,
           stat_plot = T,
           save_output = F,
           chromHMM_plot = T,
           chromHMM_annotation = "K562",
           chipseeker_plot = T,
           enrichment_plot = T,
           tss_plot = T,
           precision_recall_plot =T,
           corr_plot = T,
           interact = T,
           output_dir = "/Users/serachoi/Desktop")

Then we can at least reply to reviewers!

@bschilder
Copy link
Collaborator

bschilder commented Feb 24, 2023

Hey @serachoi1230 , thanks for sharing the code. But it looks like the file paths are hard-coded to a folder on your laptop that I don't have access to. The reprex should include all steps necessary to download the files from a public resource.

In the meantime, I might be able to use PeakyFinders to find a import the relevant files.

@serachoi1230
Copy link
Collaborator Author

@bschilder , oh shoot my bad, sorry! For each file I manually downloaded them into my laptop and ran this:

encode <- ChIPseeker::readPeakFile("path", as = "GRanges")

Here are links to ENCODE for each peak file:

Thank you so much, let me know if you need anything else!

@bschilder
Copy link
Collaborator

Thanks for the info @serachoi1230 .

Here's how you would do this programmatically:

#### Download files ####
ids <- list(atac1="ENCFF558BLC",
            dna1="ENCFF274YGF",
            dna2="ENCFF185XRG",
            chip1="ENCFF038DDS")

files <- lapply(ids, function(id){
  URL <- paste0(
    "https://www.encodeproject.org/files/",id,
    "/@@download/",id,".bed.gz"
  )
  f <- file.path(tempdir(),basename(URL))
  if(!file.exists(f)){
    utils::download.file(URL, f) 
  }
  return(f)
})

Working on running it now

@bschilder
Copy link
Collaborator

Ok, so I was able to replicate the error you described. I think you're right, it has something to do with the size of the data, as I'm able to run it all the way through if I limit all peakfiles to just chromosome 1.

We can use the chr1-only report as an example for now (I'll upload this version to the Releases), but I will continue looking into the original error. I suspect it has something to do with the storage overhead introduced by the download buttons (I may have to reconfigure them to make them more efficient).

In the meantime, I think we can go ahead and resubmit.

I'm including the full reprex script as a new file in the package:
inst/examples/atac_dnase_chip_example.R

### ATAC-seq vs. DNase-seq vs. CHiP-seq ###

#### Download files ####
ids <- list(atac1="ENCFF558BLC",
            dna1="ENCFF274YGF",
            dna2="ENCFF185XRG",
            chip1="ENCFF038DDS")

files <- lapply(ids, function(id){
  URL <- paste0(
    "https://www.encodeproject.org/files/",id,
    "/@@download/",id,".bed.gz"
  )
  f <- file.path(tempdir(),basename(URL))
  if(!file.exists(f)){
    utils::download.file(URL, f) 
  }
  return(f)
})

#### ATAC-seq peakfile ####
atac1_hg38 <- ChIPseeker::readPeakFile(files$atac1)
atac1_hg38 <- unique(atac1_hg38)
#qValue from ENCODE is V9, renaming so it can be found
GenomicRanges::mcols(atac1_hg38)$qValue <-GenomicRanges::mcols(atac1_hg38)$V9

#### DNase-seq peakfile ####
dna1_hg38 <-  ChIPseeker::readPeakFile(files$dna1)
GenomicRanges::mcols(dna1_hg38)$qValue <-GenomicRanges::mcols(dna1_hg38)$V9

dna2_hg38 <-ChIPseeker::readPeakFile(files$dna2)
GenomicRanges::mcols(dna2_hg38)$qValue <- GenomicRanges::mcols(dna2_hg38)$V9

#### ChIP-seq peakfile ####
chip_hg38 <- ChIPseeker::readPeakFile(files$chip1)
GenomicRanges::mcols(chip_hg38)$qValue <- GenomicRanges::mcols(chip_hg38)$V9

## Peaklist
peaklist <- list("ATAC_ENCFF558BLC" = atac1_hg38,
                 "Dnase_ENCFF274YGF" = dna1_hg38,
                 "ChIP_ENCFF038DDS" = chip_hg38)
peaklist_chr1 <- EpiCompare:::remove_nonstandard_chrom(grlist = peaklist, 
                                                       keep_chr = "chr1")
## Reference
reference <- list("Dnase_ENCFF185XRG_reference"=dna2_hg38)
reference_chr1 <- EpiCompare:::remove_nonstandard_chrom(grlist = reference,
                                                        keep_chr = "chr1")

#### Run Epicompare ####
html_out <- EpiCompare::EpiCompare(peakfiles = peaklist_chr1,
                                   reference = reference_chr1, 
                                   genome_build = "hg38",
                                   genome_build_output = "hg38",  
                                   output_dir = tempdir(), 
                                   run_all = TRUE)

@bschilder
Copy link
Collaborator

bschilder commented Feb 24, 2023

Also added a new EpiCompare::EpiCompare arg named add_download_button and set the default to FALSE to avoid this issue until I can resolve it.

These changes have now all been pushed to GitHub (EpiCompare v1.3.4)

@serachoi1230
Copy link
Collaborator Author

Thank you so much @bschilder, this is really great!

Where did you upload the example report?

@bschilder
Copy link
Collaborator

Thank you so much @bschilder, this is really great!

Where did you upload the example report?

The file is in Releases (link above). But it's not rendered on the website yet bc GHA is failing for unrelated reasons. But go ahead and resubmit the paper now and I'll fix the site manually over the weekend so it renders in time for the reviewers

@bschilder
Copy link
Collaborator

I've now pushed the updated website with the new example report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants