Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty 't_depth' 't_ref_count' 't_alt_count' from VCF with format GT:GL:GOF:GQ:NR:NV #351

Open
ChristianRohde opened this issue Feb 12, 2024 · 1 comment

Comments

@ChristianRohde
Copy link

ChristianRohde commented Feb 12, 2024

Hi,

I have a kind of similar issue as in #332. I noticed this problem earlier and fixed the non matching names with previous data.

Now I have the problem that it seems that VCF2maf does not handle VCF files with the format GT:GL:GOF:GQ:NR:NV

Here I need to point out that I retrieved VCFv4.0 files from a colleague including a very rich annotation. Therefore I am running vcf2maf with --inhibit-vep parameter and hope that it will pick up data from my file. This is the format explanation from my VCF file:

##FORMAT=<ID=GL,Number=.,Type=Float,Description="Genotype log10-likelihoods for AA,AB and BB genotypes, where A = ref and B = variant. Only applicable for bi-allelic sites">
##FORMAT=<ID=GOF,Number=.,Type=Float,Description="Goodness of fit value">
##FORMAT=<ID=GQ,Number=.,Type=Integer,Description="Genotype quality as phred score">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Unphased genotypes">
##FORMAT=<ID=NR,Number=.,Type=Integer,Description="Number of reads covering variant location in this sample">
##FORMAT=<ID=NV,Number=.,Type=Integer,Description="Number of reads containing variant in this sample">

Here is my first line and first entry:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  P10
1       17539   .       C       A       1713    alleleBias      BRF=0.54;FR=0.5;HP=2;HapScore=1;MGOF=10;MMLQ=37;MQ=40.94;NF=28;NR=32;PP=1713;QD=29.1295;SC=TGTCTGATGCCCTGGGTCCCC;SbPval=0.4;Source=Platypus;TC=382;TCF=163;TCR=219;TR=60;WE=17547;WS=17528;GT_Classification=HETERO;SEGMENTAL_DUPLICATION;MAPABILITY=0.125;GOOD_MAP;VARIANT_CONFIDENCE=8;CSQ=A|intron_variant&non_coding_transcript_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000423562|unprocessed_pseudogene||4/9|ENST00000423562.1:n.488+67G>T|||||||||-1||SNV|HGNC|38034||||||||||||||Ensembl||C|C|||||||||||||||||||||||||||||||||||||||||||||0.163466|5.285|||||||||||,A|intron_variant&non_coding_transcript_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000438504|unprocessed_pseudogene||5/11|ENST00000438504.2:n.604+63G>T|||||||||-1||SNV|HGNC|38034|YES|||||||||||||Ensembl||C|C|||||||||||||||||||||||||||||||||||||||||||||0.163466|5.285|||||||||||,A|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000450305|transcribed_unprocessed_pseudogene|||||||||||3869|1||SNV|HGNC|37102||||||||||||||Ensembl||C|C|||||||||||||||||||||||||||||||||||||||||||||0.163466|5.285|||||||||||,A|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000456328|processed_transcript|||||||||||3130|1||SNV|HGNC|37102|YES|||||||||||||Ensembl||C|C|||||||||||||||||||||||||||||||||||||||||||||0.163466|5.285|||||||||||,A|intron_variant&non_coding_transcript_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000488147|unprocessed_pseudogene||5/10|ENST00000488147.1:n.574+67G>T|||||||||-1||SNV|HGNC|38034||||||||||||||Ensembl||C|C|||||||||||||||||||||||||||||||||||||||||||||0.163466|5.285|||||||||||,A|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000515242|transcribed_unprocessed_pseudogene|||||||||||3127|1||SNV|HGNC|37102||||||||||||||Ensembl||C|C|||||||||||||||||||||||||||||||||||||||||||||0.163466|5.285|||||||||||,A|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000518655|transcribed_unprocessed_pseudogene|||||||||||3130|1||SNV|HGNC|37102||||||||||||||Ensembl||C|C|||||||||||||||||||||||||||||||||||||||||||||0.163466|5.285|||||||||||,A|intron_variant&non_coding_transcript_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000538476|unprocessed_pseudogene||5/12|ENST00000538476.1:n.815+63G>T|||||||||-1||SNV|HGNC|38034||||||||||||||Ensembl||C|C|||||||||||||||||||||||||||||||||||||||||||||0.163466|5.285|||||||||||,A|intron_variant&non_coding_transcript_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000541675|unprocessed_pseudogene||4/8|ENST00000541675.1:n.540-35G>T|||||||||-1||SNV|HGNC|38034||||||||||||||Ensembl||C|C|||||||||||||||||||||||||||||||||||||||||||||0.163466|5.285|||||||||||,A|intron_variant&non_coding_transcript_variant|MODIFIER|WASH7P|653635|Transcript|NR_024540.1|transcribed_pseudogene||5/10|NR_024540.1:n.587+67G>T|||||||||-1||SNV|EntrezGene|38034|YES|||||||||||||RefSeq||C|C|OK||||||||||||||||||||||||||||||||||||||||||||0.163466|5.285|||||||||||,A|downstream_gene_variant|MODIFIER|DDX11L1|100287102|Transcript|NR_046018.2|transcribed_pseudogene|||||||||||3130|1||SNV|EntrezGene|37102|YES|||||||||||||RefSeq||C|C|||||||||||||||||||||||||||||||||||||||||||||0.163466|5.285|||||||||||,A|upstream_gene_variant|MODIFIER|MIR6859-1|102466751|Transcript|NR_106918.1|miRNA|||||||||||103|-1||SNV|EntrezGene||YES|||||||||||||RefSeq||C|C|||||||||||||||||||||||||||||||||||||||||||||0.163466|5.285|||||||||||     GT:GL:GOF:GQ:NR:NV      1/0:-58.46,0,-299.7:10:99:382:60


In case #332 you mention that vcf2maf needs the AD field. Can I somehow tweak the parameter vcf2maf uses to solve the problem my current VCF files? Unfortunately I did not spot any parameter like this in the help. But this should be possible, right?

Best,
Christian

@ChristianRohde
Copy link
Author

Hi,

finally I used vcf2maf with --retain-fmt NR,NV parameter. This gave me t_NR, t_NV in my exported MAF files.

Next I read in the files in R using maftools::read.maf(local_MAF_file) and combined all files one after the other with maftools::merge_mafs(). Afterwards I exported this file using maftools::write.mafSummary() and loaded with data.table::fread(). Here I could rename the cols t_NR, t_NV to t_depth, t_alt_count. From these values I can easily calculate VAF and t_ref_count. Finally I can read this table to MAF format using maftools::read.maf(table).

It sounds a bit complicated but works well.

Thank you,
Christian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant