Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing a INFO tag as a FORMAT tag failed silently instead of triggering an error #350

Open
essut opened this issue Apr 19, 2022 · 1 comment

Comments

@essut
Copy link

essut commented Apr 19, 2022

Hi, I am trying to solve issue rust-bio/rust-bio-tools#52, where the user triggered a code path that should not be possible: a FORMAT tag having a flag tag type.

I was able to reproduce by this passing a INFO tag as a FORMAT tag rbt vcf-to-txt --fmt T < tests/test.vcf , which is surprising to me since I expected the rust-htslib would panic given the tag does not exist as a FORMAT tag.

I then wrote my own code and using test.vcf as the input:

use rust_htslib::bcf::{Read, Reader};

fn main() {
    let mut bcf = Reader::from_path("test.vcf").unwrap_or_else(|e| panic!("{}", e));

    for record in bcf.records() {
        let record = record.unwrap();
        let tag = "T".as_bytes();

        println!("{:?}", record.header().info_type(tag));
        println!("{:#?}", record.info(tag));

        println!("{:?}", record.header().format_type(tag));
        println!("{:#?}", record.format(tag));
    }

    println!("{:#?}", bcf.header().header_records());
}

If "T" is interpreted as a INFO tag, all is well:

Ok((Integer, AltAlleles))
Info {
    record: Record {
        inner: 0x000055ac1a09e340,
        header: HeaderView {
            inner: 0x000055ac1a09cc20,
        },
    },
    tag: [
        84,
    ],
    buffer: Buffer {
        inner: 0x0000000000000000,
        len: 0,
    },
}

However, I am surprised that interpreting "T" as a FORMAT tag does not generate any errors:

Ok((Flag, Fixed(0)))
Format {
    record: Record {
        inner: 0x000055ac1a09e340,
        header: HeaderView {
            inner: 0x000055ac1a09cc20,
        },
    },
    tag: [
        84,
    ],
    inner: 0x0000000000000000,
    buffer: Buffer {
        inner: 0x0000000000000000,
        len: 0,
    },
}

Although in the header information, "T" is identified as a INFO tag:

[
    Generic {
        key: "fileformat",
        value: "VCFv4.3",
    },
    Filter {
        key: "FILTER",
        values: {
            "ID": "PASS",
            "Description": "\"All filters passed\"",
            "IDX": "0",
        },
    },
    Contig {
        key: "contig",
        values: {
            "ID": "1",
            "IDX": "0",
        },
    },
    Format {
        key: "FORMAT",
        values: {
            "ID": "S",
            "Number": "1",
            "Type": "String",
            "Description": "\"Text\"",
            "IDX": "1",
        },
    },
    Format {
        key: "FORMAT",
        values: {
            "ID": "GT",
            "Number": "1",
            "Type": "String",
            "Description": "\"Genotype\"",
            "IDX": "2",
        },
    },
    Info {
        key: "INFO",
        values: {
            "ID": "T",
            "Number": "A",
            "Type": "Integer",
            "Description": "\"Text\"",
            "IDX": "3",
        },
    },
    Info {
        key: "INFO",
        values: {
            "ID": "SOMATIC",
            "Number": "0",
            "Type": "Flag",
            "Description": "\"Somatic variant\"",
            "IDX": "4",
        },
    },
]

I am not sure if this is an expected behaviour or not. If it is not, it would help fixing this here instead of relying on downstream tools to catch this error. I am also not sure if this had been discussed before, so apologies for duplicates.

Versions:
rust-bio-tools 0.39.0
rust-htslib 0.38.2

@Meizuamy
Copy link

Meizuamy commented May 9, 2023

use std::str::from_utf8 trans BufferedBackend to &str

use rust_htslib::bcf::{Read, Reader};

fn main() {
    let mut bcf = Reader::from_path("test.vcf").unwrap_or_else(|e| panic!("{}", e));

    for record in bcf.records() {
        let record = record.unwrap();
        let tag = "T".as_bytes();

        println!("{:?}", record.header().info_type(tag));
        println!("{:#?}", std::str::from_utf8(record.info(tag).string().except("No T tag in info!").expect("This record T tag has no value!")));

        println!("{:?}", record.header().format_type(tag));
        println!("{:#?}", std::str::from_utf8(record.format(tag).string().expect("No T tag in format!")));
    }

    println!("{:#?}", bcf.header().header_records());
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants