Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better selection of sequences if more than --seqs-per-taxon #30

Open
iimog opened this issue Jul 10, 2019 · 0 comments
Open

Better selection of sequences if more than --seqs-per-taxon #30

iimog opened this issue Jul 10, 2019 · 0 comments
Labels
enhancement New feature or request

Comments

@iimog
Copy link
Member

iimog commented Jul 10, 2019

Currently, if there are more than --seqs-per-taxon (default 3, #28 suggests 9) sequences for a taxon only the longest ones will be kept, ties broken arbitrarily. Andreas Kolter suggested to use voucher information from NCBI to not take multiple sequences from the same specimen and to shuffle NCBI IDs to get more diverse studies because sequences from the same study often get similar IDs.

@iimog iimog added the enhancement New feature or request label Jul 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant