Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add filter for short contigs? #6

Open
nick-youngblut opened this issue Dec 16, 2020 · 2 comments
Open

add filter for short contigs? #6

nick-youngblut opened this issue Dec 16, 2020 · 2 comments
Labels
enhancement New feature or request

Comments

@nick-youngblut
Copy link

graphbin2 doesn't seem to scale very well for large assemblies with large number of contigs. Given that a big fraction of the contigs generated by metaSPAdes are usually small, and there's no contig length cutoff for spades, would it be possible to add a contig length cutoff to graphbin2 (e.g., all contigs <1kb are skipped) in order to speed up the algorithm, or does the algorithm require all contigs in order to function properly?

@nick-youngblut
Copy link
Author

I believe that I created a method to pre-filter out all contigs and speed up graphbin2. In order to get the code running effectively, I had to make huge changes, so a PR doesn't make much sense. Some things that I changed in the code that I found to be beneficial for reading & running graphbin2:

  • Used argparse command => subcommand structure for calling graphbin2_SPAdes.py (or graphbin2_SGA.py) instead of using os.system to call the code. This change greatly helps with debugging exceptions, which an os.system call of a script will not provide
  • Used the logging package for status output instead of print(), given that at least on some machines, the tqdm stderr output will be written prior to the print stdout, which causes confusion when reading the log
  • Used "my string {}".format(integer) method for formatting strings
  • When possible, created specific exceptions (eg., except ValueError) instead of general exceptions (ie., except)
  • Generally tried to format the code using pep8

@Vini2
Copy link
Collaborator

Vini2 commented Jan 27, 2021

Hello @nick-youngblut,

Thank you for the question. GraphBin2 was originally designed to recover short contigs as much as possible. Hence, we did not put introduce a filter for short contigs. However, I understand that this can be a scaling issue with very large datasets. I'm glad you were able to modify the code as you need. Thank you for sharing the details of the things you changed. I will add a fix providing the option to filter out contigs in future.

Thank you!

@Vini2 Vini2 added the enhancement New feature or request label Jan 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants