Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem creating new BAM records (SAM=Segfault,BAM=silent failure) #339

Open
ebioman opened this issue Nov 2, 2021 · 8 comments
Open

Problem creating new BAM records (SAM=Segfault,BAM=silent failure) #339

ebioman opened this issue Nov 2, 2021 · 8 comments

Comments

@ebioman
Copy link

ebioman commented Nov 2, 2021

Hi
I am currently trying to create uBAM (unmapped BAM) files and running into problems creating non-mapped entries from
any kind of given sequence (e.g. FASTA, FASTQ,...).
Essentially I can create a new BAM default record and BAM::Writer will write it but then I get either:

  1. for SAM output a segfault
  2. for BAM output no error at all, but the generated BAM is truncated

To understand better, lets take a BAM input and generate a BAM output file

use rust_htslib::{bam, bam::Read};

fn ubam2ubam (input: &str, output: &str) {

  let mut reader = bam::Reader::from_path(input).expect("ERROR: could not open input file");
  let header       = bam::Header::from_template(reader.header());
  let mut writer  = bam::Writer::from_path(output, &header, bam::Format::Sam ).expect("ERROR: could not open output file");
  
  for r in reader.records() {
          let record = r.expect("ERROR: could not read BAM entry!");
          // now generating an empty record
          let mut out_record = bam::Record::default();
          // populating with some features from input BAM
          out_record.set(
              record.qname(), //qname
              None, //Cigar
              &record.seq().as_bytes(), //sequence 
              record.qual() //qual
          );
          writer.write(&out_record).expect("ERROR: could not write out record!");
      }
}

This compiles but fails then with a Segmentation fault (core dumped) for Sam.
If I write the original record instead of the new empty one everything works fine.
I guess there is some combination of feature or rather missing feature which breaks it but I cant find which one.

@jch-13
Copy link
Contributor

jch-13 commented Nov 2, 2021

Hello! Can you please try to explicitly unset contig and position right before writing the Record to the output file?

out_record.set_tid(-1);
out_record.set_pos(-1);
out_record.set_mpos(-1);
out_record.set_mtid(-1);
// Perhaps also
out_record.set_unmapped();

Does the error still occur?

@ebioman
Copy link
Author

ebioman commented Nov 2, 2021

@jch-13 thanks, I tried that already before - but unfortunately does not fix the situation.
A few things I tried unsuccessfully so far:

  • out_record.set_tid(-1);
  • out_record.set_flags(4);
  • out_record.set_pos(-1);
  • let mapq = "*".as_bytes();
    out_record.set_mapq(mapq[0]);
  • out_record.set_mpos(-1);
  • out_record.set_insert_size(0);

My suspicion is currently either the CIGAR or the header somehow.
For Cigar I tried the following as there is no record.set_Cigar function:

let cigar_str = CigarString::try_from("".as_bytes()).expect("ERROR: could not create empty CIGAR string!");
out_record.set(
              record.qname(), //qname
              Some(&cigar_str), //Cigar
              &record.seq().as_bytes(), //sequence 
              record.qual() //qual
          );

But it does not fix the issue neither. For the header I am wondering is it mandatory to set maybe set_header for each entry?

@jch-13
Copy link
Contributor

jch-13 commented Nov 2, 2021

I can't reproduce it here (the code is working just fine when I add the lines above), so the culprit might indeed be the header of the input BAM file.

@ebioman
Copy link
Author

ebioman commented Nov 2, 2021

Thanks, the following fixed it:

out_record.set_mtid(-1);

Which I did not try and which you suggested later in the edit. This was not on my radar at all as I am working with single reads and not paired ones.

I would though currently hesitate to close the ticket as I think that this would maybe need a fix in the bam::Record::default(); or some kind of other structure to assure there is no silent failure in Bam case and that the Sam failure is easier to understand ?

@jch-13
Copy link
Contributor

jch-13 commented Nov 2, 2021

Ah yes, sorry, I should have made the edit more clear. And I think you are right; this is more of a work-around than a proper fix.

I wonder if MaybeUninit::zeroed().assume_init() in here:

inner: unsafe { MaybeUninit::zeroed().assume_init() },
should be replaced by *htslib::bam_init1().

Edit: No, it turns out that this does not help. This seems to be htslib's default behaviour, so I'm not sure whether we should default to creating unmapped records in rust-htslib or just document it carefully?

@ebioman
Copy link
Author

ebioman commented Nov 3, 2021

Personally I would advocate that the default of generating a new record should be a valid un-mapped single read first, with no pair defined. That would be safe, meaning one could generate a default new header with a default new record and it would generate a valid BAM file.
If we'd go rather for paired reads as default (maybe people prefer that) at least it should then generate a valid default record for the 2nd read by default, too.
The way it is currently, that a new default record ends in an error is obviously not favorable as it is not easily understandable for the user where the error stems from.

@jch-13
Copy link
Contributor

jch-13 commented Nov 3, 2021

Looking at the headers over at https://github.com/samtools/htslib/blob/9045785cef02e7cd97b0b4caecbdcaf6217a986a/htslib/sam.h#L1007-L1036, it seems htslib's bam_set1() method sets all necessary fields, whereas rust-htslib's bam::record::set() only sets the variable length data. So in rust-htslib, we don't actually have a method to create a valid BAM record where all necessary fields are populated.
I believe it would make sense if bam::record::new() took all necessary fields as arguments. The Default impl could take fewer arguments, if any (by calling new() with reasonable defaults). What do you think?

@ebioman
Copy link
Author

ebioman commented Nov 11, 2021

Hello @jch-13 , sorry for the late reply.
I think this makes definitely sense in my opinion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants