New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Write fasta with fix line width #490
base: master
Are you sure you want to change the base?
Conversation
- I would add tests, see end of file - Use chunks iterator - Split write into write_header and write_sequence
Update fasta.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! This is certainly a good idea. I think though that the width should be a global setting. I would add
pub fn set_linewrap(&mut self, width: Option<usize>)
to the Writer struct.
By default, linewrap would be None. The reason is that there is no need to set the linewrap for each record to be written, it should be a global setting.
I couldn't find a way to add my commits directly to this PR so I added my commit to openpaul's branch: openpaul#2. |
@alvanuffelen your PR looks good! @openpaul would you be willing to accept it, so that this one can be merged as well? |
set linewrap to struct Writer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, when looking at this again, I wonder if it might not make more sense to just keep the original write and write_record methods, and let them behave differently when the linewidth is set to something, instead of having separate methods.
src/io/fasta.rs
Outdated
pub fn set_linewrap(&mut self, width: usize) { | ||
self.linewrap = Some(width); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub fn set_linewrap(&mut self, width: usize) { | |
self.linewrap = Some(width); | |
} | |
pub fn set_linewrap(&mut self, linewrap: Option<usize>) { | |
self.linewrap = linewrap; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now part of the PR. Let me know if that was done as expected.
This PR was on the mind again this week, so here is an attempt to make sure it can soon be closed. This new version should incorporate the suggestions by allowing the linewrap as an optional parameter. All tests passed locally for me. Waiting for the CI to tell me if any linting issues are there. |
Writing fasta files often is best with line breaks. Currently
write_record
does not break lines.I am not sure if there is already functionality for this, if so, please just close this issue.
If not, I propose these changes which introduces a new function
write_record_width
, which basically just adds a line break after every nth char.This would make the output consistent with reference databases, such as NCBI RefSeq or the hg38 genome.
One could have a default line width, currently it is a exposed parameter. It could be 60 chars (thats what SPAdes outputs) but also 70 (NCBI) or 50 (hg38).
Tests are adjusted for this.