Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cutN penalty to identify all Ns? #187

Open
nikostr opened this issue Jan 27, 2022 · 0 comments
Open

cutN penalty to identify all Ns? #187

nikostr opened this issue Jan 27, 2022 · 0 comments

Comments

@nikostr
Copy link

nikostr commented Jan 27, 2022

I'm interested in running cutN to identify all regions of Ns in my sequence. If I'm understanding the code correctly, regions of Ns are interrupted if the score becomes negative, and score corresponds to number of Ns - number of non-Ns * penalty. A penalty of zero gives a region starting from the first N and going to the end of the sequence, and small penalties lead to regions of Ns being merged, with the non-N sequences being discarded. To ensure exact regions of Ns, the penalty needs to be sufficient to always be bigger than the contiguous number of Ns prior to the first non-N, with a too small penalty leading to regions of Ns being merged. Am I understanding this correctly? Would it make sense to have a way of explicitly extracting all contiguous regions of Ns? This could perhaps be done by having reserved penalty values (e.g. 0 or 1000000000), or by adding a flag to support this behavior?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant