Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge VCF headers #343

Open
wdesouza opened this issue Nov 26, 2021 · 2 comments
Open

Merge VCF headers #343

wdesouza opened this issue Nov 26, 2021 · 2 comments

Comments

@wdesouza
Copy link

Is it possible to merge VCF headers to combine records from different files? I'am trying to do something like this

let mut out_header = Header::new();
for record in reader.header().header_records() {
    let header_line = todo!("convert record to string?");
    out_header.push_record(header_line);
}
@ebioman
Copy link

ebioman commented Dec 13, 2021

Hi stupid question but do you want to combine multiple or just 2 files ?
If the latter you could initialize the out_header instead of new with template for 1st file:

let mut out_header = Header::from_template(vcfA.header());

Then you can iterate over header from file 2 and push into 1st, no ?

UPDATE:

Otherwise I would do something like that:

use rustc_hash::FxHashMap;

    let mut vcf_contigs : FxHashMap<String,linear_map::LinearMap<String,String>> = FxHashMap::default();

    for entries in vcf_a.header().header_records() {
        match entries {
            rust_htslib::bcf::header::HeaderRecord::Contig{key,mut values} => {
                values.remove("IDX").expect("ERROR:could not remove ID entry!");
                vcf_contigs.insert(values.get("ID").expect("ERROR: could not get ID of contig!").to_string(), values);
            },
            _ => println!("Not interesting"),
        };
    };

    for entries in vcf_b.header().header_records() {
        match entries {
            rust_htslib::bcf::header::HeaderRecord::Contig{key,mut values} => {
                values.remove("IDX").expect("ERROR:could not remove ID entry!");
                let id =  values.get("ID").expect("ERROR: could not get ID of contig!").to_string();
                let length = values.get("length").expect("ERROR: could not get length of contig!").to_string(); 
                if let Some(existant) = vcf_contigs.get(&id){
                    if existant.get("length").expect("ERROR: could not get length of contig!").to_string() != length {
                        panic!("ERROR: length of similar contig IDs differed!");
                    }else{
                        continue
                    }

                }else{
                    vcf_contigs.insert(values.get("ID").expect("ERROR: could not get ID of contig!").to_string(), values);
                }
            },
            _ => println!("Not interesting"),
        };
    };

    for (_,value) in vcf_contigs.iter() {
        let id = value.get("ID").expect("ERROR: could not get ID of contig!").to_string();
        let length = value.get("length").expect("ERROR: could not get ID of contig!").to_string();
        let assembly = value.get("assembly").expect("ERROR: could not get ID of contig!").to_string();
        vcf_header.push_record(format!("##contig=<ID={},length={},assembly={}>", id, length, assembly).as_bytes());
    }

This is only to match e.g. the header contig entries and to assure they are compatible.
Be aware, I am a rust noob though and there might be better and for sure more elegant ways to do that ;)

@ebioman
Copy link

ebioman commented Dec 13, 2021

I encounter though now a weird problem doing that.

  • VCF1_reader has an associated HeaderView
  • VCF2_reader has an associated HeaderView

Now I am generating a new header in this process for the new comparison vcf writer.
If I visit now entries from FileA and do simply a vcf.write(&record) into the comparison writer this works well and I get the new header and the entry from FileA.
Based on the comparison I would like to add now a new "INFO" field, e.g. "seen=2" if 2 samples contain a similar entry.
But for that I would have to do now a record.push_info_string which is now weird because it has to be defined as well in the
header of the FileA, but how can I actually modify this now ???
I can obviously add this info in my new header for the comparison but this is not sufficient and the push above will fail into the header of File1.

Is there a way similar to :

pub fn header(&self) -> &HeaderView

Return associated header.

a way to modify the associated header or change the associated header ?

UPDATE:

Obviously one way to do it is then to use vcf.empty_record() and populate it with the record from above.
Unfortunately there is no vcf.record_from() or something similar as far as I can see

UPDATE:
Actually I looked into the wrong place, by doing:

vcf.translate(record);

It translates a record into the new writer and one can then push additional information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants