Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic on omitted trailing FORMAT fields #407

Open
fennerm opened this issue Aug 17, 2023 · 2 comments
Open

Panic on omitted trailing FORMAT fields #407

fennerm opened this issue Aug 17, 2023 · 2 comments

Comments

@fennerm
Copy link
Contributor

fennerm commented Aug 17, 2023

I'm running into a panic when attempting to parse a VCF with rust_htslib::bcf. I can't share the real VCF but here's a minimal example:

##fileformat=VCFv4.3
##contig=<ID=chr1,length=10000>
##INFO=<ID=FOO,Number=1,Type=Integer,Description="Some field">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=ABC,Number=1,Type=String,Description="Some string field">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	SAMPLE1
chr1	1234	.	t	a	.	.	FOO=1	GT:ABC	.

The problem is that the sample column only has one "." but there are two fields defined in the FORMAT column. Per the VCF spec I think this is valid:

Trailing fields can be dropped, with the exception of the GT field, which should always be present if specified in the FORMAT field.

Panic message:

thread panicked at 'chunk size must be non-zero', /Users/fennerm/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rust-htslib-0.44.1/src/bcf/record.rs:1490:18

Relevant line in code:
https://github.com/rust-bio/rust-htslib/blob/master/src/bcf/record.rs#L1490

let val = record.format_shared_buffer(b"ABC", &mut buffer).string()
@fennerm
Copy link
Contributor Author

fennerm commented Aug 17, 2023

Will try to debug a bit and submit a PR

@fennerm
Copy link
Contributor Author

fennerm commented Aug 18, 2023

Took an initial look but couldn't figure out the root cause. Jotting down my notes:

  • Error only occurs if the FORMAT field has type=String - Float/Integer work fine.
  • bcf_get_format_values and self.values_per_sample() are both 0 if the field has type=String
    • this causes the panic in .chunks())
  • bcf_get_format_values and self.values_per_sample() are both 1 if the field has type=Integer
  • Unsure if that inconsistency is expected

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant