Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genbank parser needs heavy refactor or a rewrite #434

Open
carreter opened this issue Dec 20, 2023 · 8 comments
Open

Genbank parser needs heavy refactor or a rewrite #434

carreter opened this issue Dec 20, 2023 · 8 comments
Labels
enhancement New feature or request hard A major or complex undertaking medium priority The default priority for a new issue. stale

Comments

@carreter
Copy link
Collaborator

Describe the desired feature/enhancement

The io/genbank package needs a significant refactor or to be re-written entirely.

Is your feature request related to a problem?

There are several outstanding issues with the parser that will be difficult to address in its current state: #383 #352 #351 #342

Describe alternatives you've considered (optional)

Main alternative is to hack over the code as-is to fix bugs, but I think this will take more time than a rewrite.

@carreter carreter added enhancement New feature or request medium priority The default priority for a new issue. hard A major or complex undertaking labels Dec 20, 2023
@carreter
Copy link
Collaborator Author

Dropping a link to the GenBank spec here: https://www.ncbi.nlm.nih.gov/genbank/release/current/

@TwFlem
Copy link
Contributor

TwFlem commented Jan 11, 2024

That's a neat looking format 🤔

Happy to take a whack at a rewrite.

A few questions:

  1. Do we want to incrementally rewrite this? For instance, starting with just parsing a genbank file? Would a good place to start be with just the info in the header? This way, we can discuss what error handling looks like, how we want things organized, setting the tone for how we want to parse the rest of the info, etc.
  2. Do we want to start with the latest version of the spec? Or do we want to build out the backwards compatibility for spec breaking changes along the way?

@carreter
Copy link
Collaborator Author

  1. I actually just kinda went at it, but would be happy to take a step back and replan - I've definitely noticed some weaknesses in my approach. I'll clean up the code and get it in committable state for just the header!
  2. I was targeting the latest version of the spec in my approach, but unsure what the best approach is of the two options you present.

@TwFlem
Copy link
Contributor

TwFlem commented Jan 11, 2024

re 1.

Oh nice! I don't want to step on your toes- so no pressure. Maybe there's room to divy up different parts of the spec after establishing how the header will be parsed? If so, I'd be happy to help with that!

Also, if it is inappropriate to divy up the work, I'd be happy to help build out 400 instead. That or any other issue that would help out Poly that is higher priority and or high difficulty.

re 2.

I think targeting the latest version of the spec is definitely the better option. Adding backwards compat after the fact seems straight forward in this case. It doesn't look like there are too many breaking changes and they seem pretty separate from everything else.

@carreter
Copy link
Collaborator Author

Not stepping on any toes :) I'll try and get that code up today, I'm away from the PC I had the code on.

Feel free to take a stab at any issues you would like!!!

@carreter carreter mentioned this issue Jan 11, 2024
6 tasks
@carreter
Copy link
Collaborator Author

Ok, draft PR is up that has the types + most of the layout of the parsers logic. @TwFlem let me know what you think!

@TwFlem
Copy link
Contributor

TwFlem commented Jan 12, 2024

@carreter That looks super good! That pattern will go along with way the genbank parser along with the other parsers as well. The error type looks super good as well. It looks flexible enough so we can get nice and specific about what it wrong. The way things are being done just looks good in general.

The only real value added suggestions are about testing.

The others are more speculation.

Copy link

This issue has had no activity in the past 2 months. Marking as stale.

@github-actions github-actions bot added the stale label Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request hard A major or complex undertaking medium priority The default priority for a new issue. stale
Projects
None yet
Development

No branches or pull requests

2 participants