Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supported for hg38? #4

Open
dmontielg opened this issue Mar 17, 2022 · 6 comments
Open

Supported for hg38? #4

dmontielg opened this issue Mar 17, 2022 · 6 comments

Comments

@dmontielg
Copy link

Hi,

Is scan2 already supported for hg38?
I received the following error in the scan2 config command:

Checking reference genome..
Checking dbSNP VCF..
Checking BAMs..
Checking phasing reference panel..
Traceback (most recent call last):
File "/path/diego/miniconda3/envs/scan2/bin/scan2", line 860, in
args.executor(args)
File "/path/diego/miniconda3/envs/scan2/bin/scan2", line 464, in do_validate
a.validate()
File "/path/diego/miniconda3/envs/scan2/bin/scan2", line 308, in validate
self.check_phaser()
File "/path/diego/miniconda3/envs/scan2/bin/scan2", line 278, in check_phaser
if self.cfg['eagle_panel'] is None:
KeyError: 'eagle_panel'

Thank you

@jluquette
Copy link
Member

Hi @dmontielg,

Sorry for the late reply. SCAN2 is indeed now ready for hg38; details have been added to the installation instructions in the main README.

@GruA
Copy link

GruA commented Jun 28, 2023

Hi,

I tried to run scan2 for hg38 and it was not succesfull. I got this error:
GenomeInfoDb::getChromInfoFromUCSC("hg38")

Error in .order_seqlevels(chrom_sizes[, "chrom"]) : 
 !anyNA(m32) is not TRUE

There was a new hg38 patch released by USCS at the beginning of this year (https://groups.google.com/a/soe.ucsc.edu/g/genome-announce/c/SytF4qkgpMw) what was source of the problem. GenomeInfoDb described it in details and fixed this in their 1.34.8 release (see Bioconductor/GenomeInfoDb#82 (comment) , 091b5d2 commit).

When installing scan2 through miniconda3 (see https://hub.docker.com/r/gru4/scan2/tags), it refers to GenomeInfoDb 1.30 and I wasn't able to push it to use newer version of GenomeInfoDb. It might need newer R version.

I made a replacement of genome.string.to.seqinfo.object function in parklab/r-scan2/R/scan2.R commit 2f4a566 parklab/r-scan2@2f4a566 (I hope this commit reflects conda r-scan2 package, dates are about right). Working in gru4/scan2workinghg38 (dockerhub) with my custom changes I was able to overcome GenomeInfoDb issue, but another error appears... from next package that refers to hg38.

In ${initialized_directory}/call_mutations/integrate_tables.log file, I found:

Loading required package: BSgenome.Hsapiens.UCSC.hg38
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'getSeq': object 'BSgenome.Hsapiens.UCSC.hg38' not found
Calls: make.integrated.table ... resolve.list -> signalConditionsASAP -> signalConditions
In addition: Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
  there is no package called <E2><80><98>BSgenome.Hsapiens.UCSC.hg38<E2><80><99>
Execution halted

Would you be able to help or suggest a quick fix?

Thanks!
GruA

@jluquette
Copy link
Member

jluquette commented Jun 28, 2023

Hi GruA,

Thanks for the report. We do know about this issue with UCSC updating its hg38 version breaking the Bioconductor package. I'd be interested to see your fix; would you mind posting the diff?

For previous users that encountered this problem, the easiest fix was to just manually install an updated GenomeInfoDb (i.e., not managed by conda). E.g.:

wget https://bioconductor.org/packages/3.16/bioc/src/contrib/GenomeInfoDb_1.34.9.tar.gz
R CMD INSTALL GenomeInfoDb_1.34.9.tar.gz

It would be nice if this could be done within conda, but unfortunately Bioconductor is a large collection of interdependent packages and is, even more unfortunately, tied to specific R versions. There will be a new version of SCAN2 published soon that will address this by bumping up the R and Bioconductor versions, but of course this will break again the next time UCSC patches hg38.

For your specific case, however, it just seems like you did not follow the instructions for installing hg38-specific packages here: https://github.com/parklab/SCAN2#human-reference-version-hg38. The first step is

conda install -c conda-forge -c bioconda bioconductor-bsgenome.hsapiens.ucsc.hg38

If you did install that package and are receiving the above error, please let me know.

@GruA
Copy link

GruA commented Jun 29, 2023

Hi jluquette,

Thank you for the quick answer.
I'm glad to hear about new version of SCAN2 coming! Are you able to estimate when it might happen?

I had a problem with installing bioconductor-bsgenome.hsapiens.ucsc.hg38 through conda. Installing it (version that matches R 4.1.3) manually as you suggested for GenomeInfoDb seems to work:

wget https://bioconductor.org/packages/3.14/data/annotation/src/contrib/BSgenome.Hsapiens.UCSC.hg38_1.4.4.tar.gz
R CMD INSTALL BSgenome.Hsapiens.UCSC.hg38_1.4.4.tar.gz

this library is loaded without an issue but SCAN2 stops at the same stage.
The very end of the ${initialized_directory}/call_mutations/integrate_tables.log file is now:

Loading required package: rtracklayer
Loading required package: BSgenome.Hsapiens.UCSC.hg38
Loading required package: SigProfilerMatrixGeneratorR
Error: SystemExit
Execution halted 

I coudn't find more detailed info about the cause of this error.
I perform my test run on the small portion of the chromosome 22, can this behavior be caused by lack of single cell specific variants detected? I checked ${initialized_directory}/call_mutations/mmq60.tab.gz file and there are 34 variants with GT='0/1' for single cell samples and GT='0/0' for bulk sample. This makes me think that the error is due to something else, but I'm not 100% sure.

My fix for GenomeInfoDb is quick and ugly. I saved a file with info about current patch, the script is reading this file and creating a seqinfo object from it, so nothing universal. If you are still interested, I can share the diff.

Thanks!
GruA

@jluquette
Copy link
Member

Hi GruA,

The new SCAN2 version is completely finished. Sadly, the only problem right now is getting conda to build it!

Yes, I once considered distributing copies of the SeqInfo objects as part of SCAN2 to avoid this issue as well. It's feasible since we only support 3 genomes. That seemed like a bad hack at the time, but maybe it's preferable to constantly rebuilding SCAN2 packages when UCSC updates its genomes (and having confused users each time UCSC quietly breaks SCAN2).

Did you continue to follow the hg38 instructions after installing the UCSC hg38 package? The next step is to install GRCh38 resources for SigProfilerMatrixGenerator. The SystemExit error here is likely coming from SigProfilerMatrixGenerator, which is a python tool.

@GruA
Copy link

GruA commented Jul 19, 2023

Hi jluquette,

I saw that the new version of scan2 is now available as a conda package, hurray! I was able to run it with hg38 without an issue. Thank you for making scan2 working again. I hope that the UCSC will not release any new patch soon... fingers crossed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants