Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter of ranks without order #97

Open
2 tasks done
alvanuffelen opened this issue Apr 25, 2024 · 5 comments
Open
2 tasks done

Filter of ranks without order #97

alvanuffelen opened this issue Apr 25, 2024 · 5 comments

Comments

@alvanuffelen
Copy link

alvanuffelen commented Apr 25, 2024

Prerequisites

  • make sure you're are using the latest version by taxonkit version
  • read the usage

Describe your issue

In the documentation, it mentions:

  1. Ranks without order should be assigned a prefix symbol "!" for each rank.

This means !no rank and !clade are defined as rank without order.

The documentation also states:

  1. TaxIDs with no rank are kept by default!!!
    They can be optionally discarded by -N/--discard-noranks.

Following Taxid has rank clade:
1783270 cellular organisms;Bacteria;FCB group FCB group clade

As expected, the taxid is not filtered out with following command:
echo 1783270 | ./taxonkit filter -L species

However, why is it filtered out when using -H?
echo 1783270 | ./taxonkit filter -H species

Based on point 5 in the documentation, TaxIDs with no rank are kept by default, so I would expect them to be kept with both -L and -H

@shenwei356
Copy link
Owner

Thanks for reporting this. It's fixed.

@alvanuffelen
Copy link
Author

Thank you!

Would it also be possible to implement the -n feature in combination with -H?
echo 2605619 | taxonkit filter -H genus
Above line prints the taxid because it has no rank.
I would like to do
echo 2605619 | taxonkit filter -H genus -n
such that the taxid gets filtered out (not printed) because the closest higher node is 'species' which is still lower than genus.

Additionally, the help page could me more clear:

-n, --save-predictable-norank do not discard some special ranks without order when using -L, where rank of the closest higher node is still lower than rank cutoff

The taxid is not only discarded 'when the rank of the closest higher node is lower than rank cutoff' but also when the rank is equal.
E.g., : echo 2605619 | taxonkit filter -L species
This gets printed because the closest higher rank is 'species' which is equal to the cutoff.

@shenwei356
Copy link
Owner

echo 2605619 | taxonkit filter -H genus -N does filter out the taxid.


You're right. I'll update the doc.

-n, --save-predictable-norank do not discard some special ranks without order when using -L, where rank of the closest higher node is equal to or lower than the rank cutoff

@alvanuffelen
Copy link
Author

Indeed, -N will discard all ranks without order.
But let's say I have the taxids 93506 (higher rank than genus) and 2605619 (lower rank than genus), both no rank.
There is no way to only retain the taxid with a higher rank than genus.
echo -e "93506\n2605619" | taxonkit filter -H genus -N will remove both.
echo -e "93506\n2605619" | taxonkit filter -H genus will retain both.

It would be useful to have something like:
echo -e "93506\n2605619" | taxonkit filter -H genus -n which will remove 2605619 but keep 93506 .

@shenwei356
Copy link
Owner

Oh, I remember now. I've considered this before but did not implement it because they are different for -L and -H.

I understand what you mean. But I think we should add another flag --discard-predictable-norank, which only discards these no-ranks (2605619) that can not be higher than the threshold.

--discard-predictable-norank should be incompatible with -N and -n.

  -N, --discard-noranks           discard all ranks without order, type "taxonkit filter --help" for details
  -n, --save-predictable-norank   do not discard some special ranks without order when using -L, where
                                  rank of the closest higher node is still lower than rank cutoff

  -Z, --discard-predictable-norank
echo -e "93506\n2605619" | taxonkit filter -H genus -Z
93506

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants