Skip to content

Experimental additions to Assembly Checker

Compare
Choose a tag to compare
@ebivariation-bot ebivariation-bot released this 16 Aug 12:22
9d958b7

Note that everything except these new features is equally stable as in the previous release v0.9.1. Using the latest version is recommended.

This release adds 2 new experimental features to the assembly checker

The 2 new features were not present in v0.9.1 and might change its behaviour in the future.

1) Possibility of checking a VCF against a FASTA file, where they use a different chromosome naming system.

For instance, your VCF uses chromosome numbers:

#CHROM	POS...
1	100 ...

but you have a FASTA with chromosome accessions:

>CM000001.3 chromosome 1
ATCG...

Now you can use the -a parameter to provide the path to a file with the mapping. The file structure expected is that of NCBI's assembly reports such as ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/002/285/GCA_000002285.2_CanFam3.1/GCA_000002285.2_CanFam3.1_assembly_report.txt

For each chromosome, the assembly checker will try to find in the FASTA any synonym under the columns "Sequence-Name", "GenBank-Accn", "RefSeq-Accn" and "UCSC-style-name".

2) Remote sequence retrieval.

If no FASTA file is provided, EBI-ENA will be queried to download the sequence of each chromosome used in the VCF to check every reference allele.