Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDS is always 0.0 #31

Open
barbaracania opened this issue Mar 25, 2022 · 7 comments
Open

RDS is always 0.0 #31

barbaracania opened this issue Mar 25, 2022 · 7 comments
Labels
help wanted Extra attention is needed

Comments

@barbaracania
Copy link

Hi!
I am trying to use Platon 1.6 installed with BioConda to identify plasmid contigs. By running the following command:

platon contigs.fasta --db ~/Databases/db --output platon_accu --mode accuracy --threads 8 --characterize

I got the following result (I am showing the first few lines):

ID Length Coverage # ORFs RDS Circular Inc Type(s) # Replication # Mobilization # OriT # Conjugation # AMRs # rRNAs # Plasmid Hits
NODE_1_length_66028_cov_26.537579 66028 26.5 50 0.0 no 0 0 0 0 0 0 0 0
NODE_1_length_63294_cov_26.832935 63294 26.8 48 0.0 no 0 0 0 0 0 0 0 0
NODE_1_length_63165_cov_26.834275 63165 26.8 48 0.0 yes 0 0 0 0 0 0 0 0
NODE_1_length_51546_cov_2.360878 51546 2.4 74 0.0 yes 0 0 0 0 0 0 0 0
NODE_2_length_32011_cov_1.484036 32011 1.5 39 0.0 yes 0 0 0 0 0 0 0 0
NODE_3_length_19747_cov_141.934964 19747 141.9 3 0.0 yes 0 0 0 0 0 0 2 0

After running the same command without "--characterize", the first two contigs are classified as chromosomal and the rest as plasmids. Now, I am not sure if it is a bug or if I am misunderstanding how the calculation of RDS or the classification criteria work, but the RDS value for all my contigs (over a thousand of them) is always 0.0. Moreover, it looks like rRNA genes were detected in the last showed contig and the number of ORFs was very low, but it was still characterized as a plasmid. Lastly, when I tried to use the sensitivity mode, I got the same results as with the accuracy mode, but when using the specificity mode, all my contigs were classified as chromosomes. Is this an expected behavior?

@barbaracania barbaracania added the bug Something isn't working label Mar 25, 2022
@oschwengers
Copy link
Owner

Hi @barbaracania,
thanks for reaching out. There are couple of things going on here, so I'll try to address them in chronological order:

  1. For some reason, the first 4 contigs are denoted as NOTE_1. Have you merged contigs from different assemblies, potentially from different strains/species? If this is the case, then this could cause severe issues for Prodigal's gene prediction which in turn would cause issues to detect Platon's marker protein sequences (MPS).
  2. --characterize leads to a full characterization of all contigs and therefore deactivates any filtering. Hence, this option can be used to gain information on any contig, no matter whether its chromosome or plasmid borne.
  3. The last contig indeed has 2 rRNAs detected, however in --characterize mode, Platon doesn't classify contigs but characterizes all of them
  4. It depends on the data, sometimes sensitivity and accuracy mode provide the same results. Also, in specificity mode Platon uses very strict classification rules for the RDS and since it is below the specificity threshold, it refuses to classify any of your contigs as plasmid. So yes, this is expected.

Could you provide some information on your data: Metagenome or isolate? Merged assemblies?
Best regards!

@oschwengers oschwengers added help wanted Extra attention is needed and removed bug Something isn't working labels Mar 26, 2022
@barbaracania
Copy link
Author

Thank you for your answer. My data is metagenomic, but the samples were treated with a plasmid-safe DNAse, so it should contain mostly plasmid reads. I ran SPAdes on it with the --metaplasmid option, and afterwards I only modified the names of contigs by removing all the information after the coverage, as otherwise Platon was not able to read the coverage correctly from them. Without the modification, the names look like this: >NODE_1_length_63294_cov_26.832935_cutoff_20_type_circular. The data was not modified in any other way. As it is suggested that the contigs produced by metaplasmidSPAdes should still be confirmed as plasmids by additional means, I thought of including Platon in my pipeline for this purpose.

Just to make this clear, I understand that using the --characterize option for Platon gives only info about contigs. I used it only to get an idea about my data and also to show it to you. When I was testing the three different modes, I was not using this option. For example, when I used

platon contigs.fasta --db ~/Databases/db --output platon_accu --mode accuracy --threads 8

my contigs.tsv file starts like this:

ID Length Coverage # ORFs RDS Circular Inc Type(s) # Replication # Mobilization # OriT # Conjugation # AMRs # rRNAs # Plasmid Hits
NODE_1_length_63165_cov_26.834275 63165 26.8 48 0.0 yes 0 0 0 0 0 0 0 0
NODE_1_length_51546_cov_2.360878 51546 2.4 74 0.0 yes 0 0 0 0 0 0 0 0
NODE_2_length_32011_cov_1.484036 32011 1.5 39 0.0 yes 0 0 0 0 0 0 0 0
NODE_3_length_19747_cov_141.934964 19747 141.9 3 0.0 yes 0 0 0 0 0 0 2 0

My contigs.chromosome.fasta contains only the first two contigs from my previous post that were not identified by Platon as circular, and the contigs.plasmid.fasta has everything else, including the contig on which the rRNA genes were found. When I try the sensitivity mode, I get the same results as with the accuracy mode, but the specificity mode gives me empty contigs.tsv and contigs.plasmid.fasta files, while all the contigs are found in the contigs.chromosome.fasta. From what I understood, the accuracy mode should take all the contig characteristics into consideration when making a choice if a contig comes from a plasmid or a chromosome, while the other two modes are relying only on the RDS values. Since all my RDS values are 0.0, I am confused why I am getting the above-described results...

@oschwengers
Copy link
Owner

Hi,
could you repeat your analysis by using the --meta option? This is currently not yet available in the latest official release v1.6 but available in the main branch. You can install it into your environment via:
git clone https://github.com/oschwengers/platon.git python -m pip install --no-deps --ignore-installed platon/
Without further information I cannot figure out what is causing this behaviour, but Prodigal will certainly not work perfectly without the meta option set as it thinks it's a single genome.
Another reason could be that Platon simply cannot detect any marker proteins within your metagenome contigs. In order to do so, I'd need the <prefix>.json.

@barbaracania
Copy link
Author

Hi,
Thank you very much for trying to help me with my issue! I tried the --meta option, but the results seem to be all the same. Here is the .json file produced with the command platon contigs.fasta --db ~/Databases/db --output platon_accu_meta --meta --mode accuracy --threads 8
contigs.json.zip

@oschwengers
Copy link
Owner

Hi,
indeed there is not a single marker protein that could be detected on your contigs, which is odd/interesting and hasn't occured so far - at least not for an entire dataset. However, we do not have much experience with metagenome data so far.

So in principle, there are 2 different reasons that I can think of:

  1. Platon's marker protein sequences are actually not encoded on these contigs. In this case, Platon's database wouldn't cover the protein space encoded in your data. We're currently compiling an updated DB which could help here.
  2. There could be an error occuring. In order to check that may I ask you to also provide the contigs.log file?

@barbaracania
Copy link
Author

Good morning,
Sure! Here is the .log file from the same run:
contigs.log

@oschwengers
Copy link
Owner

I took a look at the logs and from a technical perspective, everything is just fine.
However, there is indeed not a single blast (diamond) hit against the marker protein database which so far has not occured (at least not that I knew of). This is very interesting and helpful to know in terms of metagenome analysis with platon!

As mentioned above, I'm currently computing and compiling a database update which could help here - of course this would require further investigations. As of today, it seems to be the case that Platon is not the right tool for your dataset. May I refere you to PlasFlow? Since Platon was initially developed with single isolates in mind, PlasFlow might provide better results since it's solely addressing metagenome data.

I'll leave this open until we've released the new database version and Platon [v1.7] just to let you know.
Again, thanks for trying Platon and reporting this!
Best regards!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants