-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[vcf] replace VCF reading with TreeTime's read_vcf
#1357
Labels
enhancement
New feature or request
Comments
jameshadfield
added a commit
that referenced
this issue
Dec 20, 2023
The strain names are available from TreeTime's function of the same name. This function is preferable to (a) reduce duplicated logic and (b) because it performs more thorough validation of the VCF file. Aside: Augur's `read_vcf` function (which has moved around quite a lot) was first introduced by 278a30e in May 2018. This returned a tuple of `(strain_names, strain_names.copy())` and this return signature has been kept during refactors despite us having since stopped using the second argument, except for in tests! Closes #1357
2 tasks
jameshadfield
added a commit
that referenced
this issue
Jan 24, 2024
The strain names are available from TreeTime's function of the same name. This function is preferable to (a) reduce duplicated logic and (b) because it performs more thorough validation of the VCF file. Aside: Augur's `read_vcf` function (which has moved around quite a lot) was first introduced by 278a30e in May 2018. This returned a tuple of `(strain_names, strain_names.copy())` and this return signature has been kept during refactors despite us having since stopped using the second argument, except for in tests! Closes #1357
jameshadfield
added a commit
that referenced
this issue
Jan 25, 2024
The strain names are available from TreeTime's function of the same name. This function is preferable to (a) reduce duplicated logic and (b) because it performs more thorough validation of the VCF file. Aside: Augur's `read_vcf` function (which has moved around quite a lot) was first introduced by 278a30e in May 2018. This returned a tuple of `(strain_names, strain_names.copy())` and this return signature has been kept during refactors despite us having since stopped using the second argument, except for in tests! Closes #1357
(I'm going to re-open this issue and generalise it.) |
jameshadfield
changed the title
[vcf] replace Augur's
[vcf] replace VCF reading with TreeTime's Jan 25, 2024
read_vcf
with TreeTime'sread_vcf
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Augur uses a variety of functions to read VCF files, including TreeTime's
read_vcf
which now includes a lot of validation and helpful error messages. We should shift all our VCF reading to use this function.augur mask
Augur mask uses a function
get_chrom_name(vcf_file)
to read (part of) the VCF file. See #507 for more context.augur sequence-traits
defines its own
read_in_translate_vcf
function, but see this comment for context.augur.io.vcf
Update: removed in #1366
augur.io.vcf
includesread_vcf
which is (only) used byaugur index
. It's a bit of a misnomer as it actually only returns the strain names (from the VCF header). The treetime functionread_vcf
is a more thorough function as it (now) checks the validity of the VCF file, however it doesn't actually return a list of the strain names. It could very easily be extended to do so, and then I think we should replace Augur's function with TreeTime's.(TreeTime's function will be slower because it parses the entire VCF, but (a) I think this difference will be negligible and (b) this will flag up errors in the VCF data lines which I think is appropriate.)
The text was updated successfully, but these errors were encountered: