Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDP training using custom trainset #20

Open
najoshi opened this issue Nov 1, 2017 · 0 comments
Open

RDP training using custom trainset #20

najoshi opened this issue Nov 1, 2017 · 0 comments

Comments

@najoshi
Copy link

najoshi commented Nov 1, 2017

So I am trying to use the "train" subcommand of classifier.jar to create a new DB out of custom data. The problem is that the taxonomy data that I have is in qiime format. I want to convert the qiime format to RDP format, but I can't find any good documentation on the rdp taxonomy format. I can certainly write some code to do it, but I need to know the details of the format. So, I have my qiime formatted file which has lines that look like this:

AcrMAj74N231 k_Animalia; p_Nematoda; c_Chromadorea; o_Rhabditida; f_Cephalobidae; g_Acrobeles; s_maeneeneus

And RDP needs lines that look like this (I got this from https://sourceforge.net/projects/rdp-classifier/):

7*Acidimicrobiaceae*6*7*family

The only thing I've found is that the lines are in this format:

taxid*taxon name*parent taxid*depth*rank

However, there are some problems with the sample files that I have.... for example the taxids are not correct for the taxon... e.g. the taxid for Acidimicrobiaceae is 84994, not 7. And then, I have no idea what "depth" means... how do I calculate that?

Any help would be highly appreciated. Thanks!

  • Nik.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant