Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Length of fastafile vs length of chr in BAM header #614

Open
pedrosenna opened this issue Feb 15, 2024 · 0 comments
Open

Length of fastafile vs length of chr in BAM header #614

pedrosenna opened this issue Feb 15, 2024 · 0 comments

Comments

@pedrosenna
Copy link

pedrosenna commented Feb 15, 2024

Hello everyone,

I am trying to estimate global SFS from a BAM list of Illumina short-read sequences mapped to a reference genome. The analysis runs just fine when i set this reference in both -anc and -ref arguments. See command line below:

angsd -GL 2 -doMajorMinor 1 -doMaf 2 -doSaf 1 -doCounts 1 -bam bamfile.ls -anc ../reference/Astyanax_mexicanus.fna -ref ../reference/Astyanax_mexicanus.fna -out SFS/teste/cardinal-total-140224 -nThreads 4 -minMapQ 20 -minQ 20 -remove_bads 1 -baq 1 -C 50 -uniqueOnly 1 -only_proper_pairs 1 -minMaf 0.01 -setMinDepth 5 -setMaxDepth 100

However, when i try to use another genome (Acestrorhynchus altus) in the -anc argument:

angsd -GL 2 -doMajorMinor 1 -doMaf 2 -doSaf 1 -doCounts 1 -bam bamfile.ls -anc ../reference/Acestrorhynchus_altus.fna -ref ../reference/Astyanax_mexicanus.fna -out SFS/teste/cardinal-total-140224 -nThreads 4 -minMapQ 20 -minQ 20 -remove_bads 1 -baq 1 -C 50 -uniqueOnly 1 -only_proper_pairs 1 -minMaf 0.01 -setMinDepth 5 -setMaxDepth 100

i get the following error:

[loadChr] Error loading fasta info from chr:'NC_064408.1'
-> Problem with length of fastafile vs length of chr in BAM header
-> Chromosome name: 'NC_064408.1' length from BAM header:134019835 length from fai file:0
Trying to access fasta efter end of chromsome+200:NC_064408.1/NC_064408.1 pos=45199 ref_len=0
Trying to access fasta efter end of chromsome+200:NC_064408.1/NC_064408.1 pos=44542 ref_len=0
Trying to access fasta efter end of chromsome+200:NC_064408.1/NC_064408.1 pos=44542 ref_len=0

I have re-created the .fai file for both species and the problem still persists. Both species have the same number of chromosomes but the number of unplaced scaffolds vary greatly (85 vs 17924). Could it be a problem? Is it necessary that both -anc and -ref genomes are aligned prior to this estimation?

edit: I have tried to set -checkBamHeaders 0 but i'm still getting the same error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant