Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process POST_PROCESS_VCF failes when order of chromosomes in result.DP8.vcf differs from that in the GRCm38/Annotation/Variation/Mus_musculus.vcf #21

Open
dylanht opened this issue Mar 3, 2021 · 0 comments

Comments

@dylanht
Copy link

dylanht commented Mar 3, 2021

To fix, I added a call to vcf-sort in the middle of the POST_PROCESS_VCF script - I tried installing and using bcftools, but it requires a header with the "contig" section which is not present in these intermediate files, and vcftools is already included in the container. Will submit PR for review.

Error from vcftools on process failure is:

Comparing sites in VCF files...
  Error: Cannot determine chromosomal ordering of files, both files must contain the same chromosomes to use the diff functions.
  Found 10 in file 1 and 1 in file 2.

Looking in the working directory associated with the failing task, POST_PROCESS_VCF produces the file result.DP8.vcf with chromosomes ordered as grep -v "#" result.DP8.vcf | cut -f 1 | uniq | tr "\n" " ":

grep -v "#" result.DP8.vcf | cut -f 1 | uniq | tr "\n" " "
# 10 11 12 13 14 15 16 17 18 19 1 2 3 4 5 6 7 8 9 MT X Y
singularity exec callings-nf_gatk4.sif vcf-sort result.DP8.vcf > result.DP8.vcf.sorted
# unix command printed on execution is "sort -k1,1d -k2,2n"
grep -v "#" result.DP8.vcf.sorted | cut -f 1 | uniq | tr "\n" " "
# 1 10 11 12 13 14 15 16 17 18 19 2 3 4 5 6 7 8 9 MT X Y
grep -v "#" filtered.recode.vcf | cut -f 1 | uniq | tr "\n" " "
# 1 10 11 12 13 14 15 16 17 18 19 2 3 4 5 6 7 8 9 MT X Y

Possible confounder here is that we are trying to use the Boyle lab's https://github.com/Boyle-Lab/Blacklist for mm10, but using Ensembl build of GRCm38 available from iGenomes - I wrote in a profile into the config as such:

singularity {
    singularity.enabled = true
    singularity.cacheDir = './singularity_cache'
    process {
        container = 'quay.io/nextflow/callings-nf:gatk4'
        executor = 'slurm'
        queue = 'our_queue'
        memory = 16.GB
        errorStrategy = 'finish'
        withLabel: mem_large { memory = 48.GB }
        withLabel: mem_xlarge { memory = 64.GB }
            params {
                genome  = "iGenomes/Mus_musculus/Ensembl/GRCm38/Sequence/Bowtie2Index/genome.fa"
                reads = "reads/*_{1,2}.fastq.gz"
                variants  = "iGenomes/Mus_musculus/Ensembl/GRCm38/Annotation/Variation/Mus_musculus.vcf"
                denylist  = "iGenomes/Blacklist/lists/mm10-blacklist.v2.bed"
                results    = "./results"
            }
    }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant