Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Write fasta with fix line width #490

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

openpaul
Copy link

@openpaul openpaul commented Apr 30, 2022

Writing fasta files often is best with line breaks. Currently write_record does not break lines.

I am not sure if there is already functionality for this, if so, please just close this issue.
If not, I propose these changes which introduces a new function write_record_width, which basically just adds a line break after every nth char.

This would make the output consistent with reference databases, such as NCBI RefSeq or the hg38 genome.

One could have a default line width, currently it is a exposed parameter. It could be 60 chars (thats what SPAdes outputs) but also 70 (NCBI) or 50 (hg38).

Tests are adjusted for this.

@openpaul openpaul changed the title Write fasta with fix line width Feat: Write fasta with fix line width May 3, 2022
@openpaul openpaul changed the title Feat: Write fasta with fix line width feat: Write fasta with fix line width May 3, 2022
@coveralls
Copy link

Coverage Status

Coverage increased (+0.06%) to 88.408% when pulling 1a4051b on openpaul:write_width into 5fdb236 on rust-bio:master.

Copy link
Contributor

@johanneskoester johanneskoester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! This is certainly a good idea. I think though that the width should be a global setting. I would add

pub fn set_linewrap(&mut self, width: Option<usize>)

to the Writer struct.

By default, linewrap would be None. The reason is that there is no need to set the linewrap for each record to be written, it should be a global setting.

@alvanuffelen
Copy link

I couldn't find a way to add my commits directly to this PR so I added my commit to openpaul's branch: openpaul#2.
Any feedback is welcome (both on how to add commits to this PR if possible and on the new additions)!

@johanneskoester
Copy link
Contributor

@alvanuffelen your PR looks good! @openpaul would you be willing to accept it, so that this one can be merged as well?

set linewrap to struct Writer
Copy link
Contributor

@johanneskoester johanneskoester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, when looking at this again, I wonder if it might not make more sense to just keep the original write and write_record methods, and let them behave differently when the linewidth is set to something, instead of having separate methods.

src/io/fasta.rs Outdated
Comment on lines 866 to 868
pub fn set_linewrap(&mut self, width: usize) {
self.linewrap = Some(width);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn set_linewrap(&mut self, width: usize) {
self.linewrap = Some(width);
}
pub fn set_linewrap(&mut self, linewrap: Option<usize>) {
self.linewrap = linewrap;
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now part of the PR. Let me know if that was done as expected.

@openpaul
Copy link
Author

openpaul commented May 4, 2024

This PR was on the mind again this week, so here is an attempt to make sure it can soon be closed.

This new version should incorporate the suggestions by allowing the linewrap as an optional parameter.

All tests passed locally for me. Waiting for the CI to tell me if any linting issues are there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants