New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Writing a missing value for a FORMAT field #172
Comments
You can write |
Thank you for a fast reply. Unfortunately your suggestion did not work. What I did: import sys vcf_path = '/home/users/pjh/tmp/TEST.bcf.gz' vcf = cyvcf2.VCF(vcf_path) NA = 0x7F800001 Output: 1 94533 . C A . PASS AS_FilterStatus=SITE;AS_SB_TABLE=38,64|6,3;DP=116;ECNT=1;GERMQ=93;MBQ=37,37;MFRL=396,385;MMQ=23,40;MPOS=30;NALOD=1.5;NLOD=9.01;POPAF=6;ROQ=65;TLOD=19.97 GT:AD:AF:DP:F1R2:F2R1:SB 0/1:72,9:0.12:81:39,5:33,4:24,48,6,3 0/0:30,0:0.031:30:16,0:14,0:14,16,0,0 1 94533 . C A . PASS AS_FilterStatus=SITE;AS_SB_TABLE=38,64|6,3;DP=116;ECNT=1;GERMQ=93;MBQ=37,37;MFRL=396,385;MMQ=23,40;MPOS=30;NALOD=1.5;NLOD=9.01;POPAF=6;ROQ=65;TLOD=19.97 GT:AD:AF:DP:F1R2:F2R1:SB:NEW 0/1:72,9:0.12:81:39,5:33,4:24,48,6,3:1.1 0/0:30,0:0.031:30:16,0:14,0:14,16,0,0:2.1391e+09 |
you can use:
that shows "nan" instead of "." and I'm not sure why yet. |
On the topic of missing values for FORMAT fields:
|
Is there any way forward on this? I can confirm that currently setting a float format field with np.nans results in 'nan' being written to the output file, and that setting the field with I'd like setting values to np.nan to result in '.' being written, as '.' is currently read in as np.nan. If that's not possible, then perhaps we could pass a masked array to set_format() and the masked values would take on the missing value? That could also be a consistent solution across datatypes. But I'll take whichever is faster/easier to implement. |
@brentp thoughts? |
Hi, I'm not sure hwo to fix this. I'd be happy to accept a PR to resolve it. |
Hi,
Please understand that I do not have any experience with C or htslib.
I would like to write missing values ('.') to FORMAT fields.
I incidentally found that when the FORMAT field is of type 'Integer', entering a numpy array including -2147483648 results in missing value ('.').
But I could not find such a way with 'Float' type FORMAT field. I tried with numpy.nan, but the corresponding FORMAT field of output vcf file showed 'nan' instead of '.'.
Is there any way to write '.' in a FORMAT field?
Thanks
The text was updated successfully, but these errors were encountered: