Skip to content

Commit

Permalink
[fix] Add VCF spanning deletion check
Browse files Browse the repository at this point in the history
Issue: VCF v4.2 introduced the use of asterisk notation (*) in ALTs to represent spanning deletions. If there are alternative alleles at a site, the ‘*’ allele is reserved to indicate that an allele allele is missing due to an upstream deletion. This was not considered in angsd VCF reading and led to segfaults.
Fix: Add hasSpanningDeletion function to check if there is any spanning deletion and skip the sites with spanning deletion.
Fixes #557

Add development-related files to gitignore:
-vgcore files from valgrind
-commit_message.txt commit message template
  • Loading branch information
isinaltinkaya committed Apr 7, 2023
1 parent 7a5e0db commit 09d2b8a
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 2 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Expand Up @@ -48,3 +48,7 @@ misc/splitgl
misc/supersim
misc/thetaStat
version.h

# Development-related files
vgcore*
commit_message.txt
25 changes: 23 additions & 2 deletions vcfReader.cpp
Expand Up @@ -230,6 +230,26 @@ int isindel(const bcf_hdr_t *h, const bcf1_t *v){
}


/// @brief check if any allele is spanning deletion
/// @param rec - bcf1_t record for one VCF line
/// @return 1 : if any allele is deletion, 0 otherwise
/// @details
/// As of VCF 4.2, '*' is used to represent a spanning deletion if
/// there are other alleles present at site.
int hasSpanningDeletion(const bcf1_t *rec){
if(rec->d.allele==NULL){
return 0;
}
for(int i=0;i<rec->n_allele;i++){
if(strncmp(rec->d.allele[i],"*",1)==0){
// fprintf(stderr,"\n[INFO]\t-> Skipping site at position %d. Reason: Found spanning deletion (%s).\n", (int)rec->pos+1, rec->d.allele[i]);
return 1;
}
}
return 0;
}


//type=0 -> PL
//type=1 -> GL
//type=2 -> GP
Expand All @@ -238,8 +258,9 @@ int dumpcounterverbose[2] = {0,0};
int vcfReader::parseline(bcf1_t *rec,htsstuff *hs,funkyPars *r,int &balcon,int type){
aio::doAssert(type>=0&&type<=2,1,AT,"");
int n;
if(isindel(hs->hdr,rec)!=0)
if(isindel(hs->hdr,rec)!=0 || hasSpanningDeletion(rec)!=0){
return 0;
}

r->major[balcon] = refToChar[rec->d.allele[0][0]];
r->minor[balcon] = refToChar[rec->d.allele[1][0]];
Expand Down Expand Up @@ -502,7 +523,7 @@ funkyPars *vcfReader::fetch(int chunkSize) {

n++;
//skip nonsnips
if(isindel(hs->hdr,rec)){
if(isindel(hs->hdr,rec)!=0 || hasSpanningDeletion(rec)){
if(onlyprint>0){
fprintf(stderr,"\t Skipping due to non snp pos:%d (this message will be silenced after 10 sites)\n",(int)rec->pos+1);
onlyprint--;
Expand Down

0 comments on commit 09d2b8a

Please sign in to comment.