Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vcf] replace VCF reading with TreeTime's read_vcf #1357

Open
jameshadfield opened this issue Dec 13, 2023 · 1 comment · Fixed by #1366
Open

[vcf] replace VCF reading with TreeTime's read_vcf #1357

jameshadfield opened this issue Dec 13, 2023 · 1 comment · Fixed by #1366
Labels
enhancement New feature or request

Comments

@jameshadfield
Copy link
Member

jameshadfield commented Dec 13, 2023

Augur uses a variety of functions to read VCF files, including TreeTime's read_vcf which now includes a lot of validation and helpful error messages. We should shift all our VCF reading to use this function.

augur mask

Augur mask uses a function get_chrom_name(vcf_file) to read (part of) the VCF file. See #507 for more context.

augur sequence-traits

defines its own read_in_translate_vcf function, but see this comment for context.

augur.io.vcf

Update: removed in #1366

augur.io.vcf includes read_vcf which is (only) used by augur index. It's a bit of a misnomer as it actually only returns the strain names (from the VCF header). The treetime function read_vcf is a more thorough function as it (now) checks the validity of the VCF file, however it doesn't actually return a list of the strain names. It could very easily be extended to do so, and then I think we should replace Augur's function with TreeTime's.

(TreeTime's function will be slower because it parses the entire VCF, but (a) I think this difference will be negligible and (b) this will flag up errors in the VCF data lines which I think is appropriate.)

@jameshadfield jameshadfield added the enhancement New feature or request label Dec 13, 2023
jameshadfield added a commit that referenced this issue Dec 20, 2023
The strain names are available from TreeTime's function of the same
name. This function is preferable to (a) reduce duplicated logic and
(b) because it performs more thorough validation of the VCF file.

Aside: Augur's `read_vcf` function (which has moved around quite a lot)
was first introduced by 278a30e in May
2018. This returned a tuple of `(strain_names, strain_names.copy())` and
this return signature has been kept during refactors despite us having
since stopped using the second argument, except for in tests!

Closes #1357
jameshadfield added a commit that referenced this issue Jan 24, 2024
The strain names are available from TreeTime's function of the same
name. This function is preferable to (a) reduce duplicated logic and
(b) because it performs more thorough validation of the VCF file.

Aside: Augur's `read_vcf` function (which has moved around quite a lot)
was first introduced by 278a30e in May
2018. This returned a tuple of `(strain_names, strain_names.copy())` and
this return signature has been kept during refactors despite us having
since stopped using the second argument, except for in tests!

Closes #1357
jameshadfield added a commit that referenced this issue Jan 25, 2024
The strain names are available from TreeTime's function of the same
name. This function is preferable to (a) reduce duplicated logic and
(b) because it performs more thorough validation of the VCF file.

Aside: Augur's `read_vcf` function (which has moved around quite a lot)
was first introduced by 278a30e in May
2018. This returned a tuple of `(strain_names, strain_names.copy())` and
this return signature has been kept during refactors despite us having
since stopped using the second argument, except for in tests!

Closes #1357
@jameshadfield
Copy link
Member Author

(I'm going to re-open this issue and generalise it.)

@jameshadfield jameshadfield reopened this Jan 25, 2024
@jameshadfield jameshadfield changed the title [vcf] replace Augur's read_vcf with TreeTime's [vcf] replace VCF reading with TreeTime's read_vcf Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant