Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpretation of results #24

Closed
franciscozorrilla opened this issue Nov 2, 2021 · 1 comment
Closed

Interpretation of results #24

franciscozorrilla opened this issue Nov 2, 2021 · 1 comment
Labels
question Further information is requested

Comments

@franciscozorrilla
Copy link
Contributor

Hi @oschwengers,

Thanks again for recommending the platon tool, this is exactly what I was looking for.
I just wanted to double check the interpretation of the results with you, below is the verbose output of one of my isolate genome assemblies:

[10:52:43 am GMT] START SAMPLE metagem_lane821s003044.fa
Platon v1.6
Options and arguments:
	input: /rds/project/rds-XUr6B1Jhndg/fz274/kost_soil/drep/metagem_lane821s003044.fa
	db: /rds/user/fz274/hpc-work/platon_db/db
	output: /rds/project/rds-XUr6B1Jhndg/fz274/kost_soil/test_platon
	prefix: metagem_lane821s003044
	mode: accuracy
	characterize: False
	tmp path: /tmp/tmpzj0_z4kw
	# threads: 32
parse draft genome...
	exclude contig 'NODE_137_length_892_cov_367.357055', too short (892)
	exclude contig 'NODE_138_length_853_cov_2176.626289', too short (853)
	exclude contig 'NODE_139_length_829_cov_371.747340', too short (829)
	exclude contig 'NODE_140_length_818_cov_1585.425101', too short (818)
	exclude contig 'NODE_141_length_811_cov_758.149864', too short (811)
	exclude contig 'NODE_142_length_789_cov_547.790730', too short (789)
	exclude contig 'NODE_143_length_704_cov_116.652313', too short (704)
	exclude contig 'NODE_144_length_629_cov_1271.644928', too short (629)
	exclude contig 'NODE_145_length_629_cov_1248.532609', too short (629)
	exclude contig 'NODE_146_length_612_cov_309.685981', too short (612)
	exclude contig 'NODE_147_length_578_cov_592.802395', too short (578)
	exclude contig 'NODE_148_length_552_cov_409.675789', too short (552)
	exclude contig 'NODE_149_length_545_cov_3045.194444', too short (545)
	exclude contig 'NODE_150_length_528_cov_10815.097561', too short (528)
	parsed 150 raw contigs
	excluded 14 contigs by size filter
	analyze 136 contigs
predict ORFs...
	found 6608 ORFs
search marker protein sequences (MPS)...
	found 661 MPS
compute replicon distribution scores (RDS)...
apply RDS sensitivity threshold (SNT=-7.9) filter...
	excluded 0 contigs by SNT filter
characterize contigs...
ID	Length	Coverage	# ORFs	RDS	Circular	Inc Type(s)	# Replication	# Mobilization	# OriT	# Conjugation	# AMRs	# rRNAs	# Plasmid Hits
NODE_62_length_22943_cov_24.692863	22943	24.7	23	0.0	no	0	2	0	0	0	0	0	0
NODE_128_length_1337_cov_418.150000	1337	418.1	1	0.1	no	0	0	0	0	0	0	0	1
[10:55:06 am GMT] DONE RUNNING SAMPLE metagem_lane821s003044.fa

The printed table at the end seems to suggest that the contig NODE_62_length_22943_cov_24.692863 did not have any plasmid hits, however this contig is included in the *.plasmid.fasta file. Could you please clarify how I should interpret these results?

To give some background on my research question: I am interested in identifying plasmid-borne contigs and then searching for any metabolic genes present in those plasmids. Would you recommend I stick with the default accuracy mode for this?

Thank you and best wishes,
Francisco

@franciscozorrilla franciscozorrilla added the bug Something isn't working label Nov 2, 2021
@oschwengers oschwengers added question Further information is requested and removed bug Something isn't working labels Nov 2, 2021
@oschwengers
Copy link
Owner

oschwengers commented Nov 2, 2021

Hi @franciscozorrilla,
there are many approaches to classify the replicon origin of a contig. One is the RDS method which Platon is essentially all about. However, in order to improve the classification for certain types of contigs that are hard to classify, we added several characterization steps and heuristic filters. One of these filters are mere blastn hits against RefSeq plasmid sequences. But, just because a contig doesn't align to a known plasmid sequence this does not mean that this is not a plasmid contig. For instance, this might be a contig of an unknown plasmid. This is endorsed by the detection of 2 replication proteins.

Regarding your 2nd question: In general I'd recommend the accuracy mode for any high-throughput analysis. If you can manage to take a closer look at all results in person, you could also run Platon in sensitivity or even characterization mode. The first will use relaxed RDS thresholds the latter skips all filters and fully characterizes all contigs (which might take a while).

@franciscozorrilla franciscozorrilla changed the title Interpretation of results [not a bug] Interpretation of results Nov 2, 2021
@oschwengers oschwengers pinned this issue Nov 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants