Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fixed fn to_hashmap so it can avoid an error when @CO tag appeared #363

Merged
merged 7 commits into from Aug 24, 2022

Conversation

shohei-kojima
Copy link
Contributor

When a header contains @CO line(s), it may issue an error at the line 88 of 4d99455 (let cap = TAG_RE.captures(part).unwrap();) because @CO line may not have values of "([A-Za-z][A-Za-z0-9]):([ -~]+)".

E.g.
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/illumina_platinum_pedigree/data/CEU/NA12878/alignment/NA12878.alt_bwamem_GRCh38DH.20150706.CEU.illumina_platinum_ped.cram

Header lines:

samtools view -H NA12878.alt_bwamem_GRCh38DH.20150706.CEU.illumina_platinum_ped.cram | tail -n 5

@CO     $known_indels_file(s) = ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/other_mapping_resources/Mills_and_1000G_gold_standard.indels.b38.primary_assembly.vcf.gz
@CO     $known_indels_file(s) = ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/other_mapping_resources/ALL.wgs.1000G_phase3.GRCh38.ncbi_remapper.20150424.shapeit2_indels.vcf.gz
@CO     FASTQ=ERR194147_1.fastq.gz
@CO     FASTQ=ERR194147_2.fastq.gz
@CO     FASTQ=ERR194147.fastq.gz

Error:

thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', /home/kooojiii/.cargo/registry/src/github.com-1ecc6299db9ec823/rust-htslib-0.39.5/src/bam/header.rs:88:49
stack backtrace:
   0: rust_begin_unwind
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/panicking.rs:143:14
   2: core::panicking::panic
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/panicking.rs:48:5
   3: rust_htslib::bam::header::Header::to_hashmap

@veldsla
Copy link
Contributor

veldsla commented Aug 19, 2022

to_hashmap is indeed wrong. It's also undocumented. The current return type will not be suitable for the CO lines so I guess skipping them would be a valid workaround for this. Maybe add the documentation that this function returns an incomplete map?

It could also be convenient to add a comments() function to complete the functionality?

@shohei-kojima
Copy link
Contributor Author

Added the documentations and get_comments() function.

@veldsla
Copy link
Contributor

veldsla commented Aug 22, 2022

get_comments can avoid the unwrap by using String::from_utf8_lossy. It can also be implemented without allocations (if valid utf-8) by returning an Iterator

Also I think the get_ prefix is commonly used for functions consuming self so the function signature could look something like this:

fn comments(&self) -> impl Iterator<Item=Cow<str>>

The function body is only a few lines. Do you want to have a go?

@shohei-kojima
Copy link
Contributor Author

Thank you for revising it!
I agree, returning an Iterator is better. I changed it so.

@veldsla
Copy link
Contributor

veldsla commented Aug 22, 2022

Isn't the header is already split on newlines in the struct? This would make the flat_map and split unnecessary. I think you only need the filter and map!

@shohei-kojima
Copy link
Contributor Author

Unfortunately, no. When I generated a Header struct from from_template function, records in the Header struct contained non-split headers. So I needed to split it.
Like this:
[ [ "@hd\tVN:1.5\tGO:none\tSO:coordinate\n@PG\tID:MarkDuplicates\n...." ] ]

@veldsla
Copy link
Contributor

veldsla commented Aug 22, 2022

You are correct. Only pushed records are appended to the Vec

@veldsla
Copy link
Contributor

veldsla commented Aug 22, 2022

So you need to run rustfmt and apparently the PR title needs to be formatted for the release tool. The clippy warnings appear to be for other code.

@shohei-kojima shohei-kojima changed the title fixed fn to_hashmap so it can avoid an error when @CO tag appeared fix: fixed fn to_hashmap so it can avoid an error when @CO tag appeared Aug 22, 2022
@coveralls
Copy link

coveralls commented Aug 23, 2022

Pull Request Test Coverage Report for Build 2910319725

  • 1 of 3 (33.33%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.01%) to 94.668%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/bam/header.rs 1 3 33.33%
Totals Coverage Status
Change from base Build 2710471876: -0.01%
Covered Lines: 11647
Relevant Lines: 12303

💛 - Coveralls

@shohei-kojima
Copy link
Contributor Author

I added #[allow(clippy::read_zero_byte_vec)] to avoid the clippy error, but I am not sure if this is OK. Please review this.

Copy link
Contributor

@veldsla veldsla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Not sure if the tbx::mod change is within the scope of this PR, but it does please clippy.

Now to find someone who can actually merge it 😄 @rust-bio/mergers ?

@pmarks
Copy link
Contributor

pmarks commented Aug 24, 2022

This is a breaking change because existing users of to_hashmap will no longer get comments in the output (assuming they were reading BAM files that didn't hit this bug). So I think we should change the title of the PR to feat!: before merging and that will do the right thing. See e.g. #306 for an example - that PR led to a "Breaking Change" entry in the changelog. @veldsla @shohei-kojima sound good?

@veldsla
Copy link
Contributor

veldsla commented Aug 24, 2022

You never got comments in the hashmap. It used to panic when a comment was present.

@pmarks
Copy link
Contributor

pmarks commented Aug 24, 2022

Ah right, the regex requires the colon. Cool.

@pmarks pmarks merged commit c24a7f6 into rust-bio:master Aug 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants