Skip to content
Sergey Venev edited this page Nov 11, 2015 · 17 revisions

Welcome to the Thermal_adapt_scripts wiki!

This is just a collection of Python scripts that we used to perform bioinformatics analysis for "Thermal adaptation in prokaryotes paper". Scripts are slightly different for Bacterial and Archaeal domains, as we collected data differently for these groups. There are lots of fully sequenced and annotated bacterial species, allowing for fully automated processing of the NCBI databases. However, archaeas are significantly less represented in the NCBI, forcing us to use more manual data searching, specifically concerning environmental temperatures retrieval.

Bacterial data retrieval & processing is described in details here

Archaeal data retrieval & processing is described here

Codon shuffling bootstrap is described here

Plotting for the publication described [here] (https://github.com/sergpolly/Thermal_adapt_scripts/wiki/Publication)

TODO: maybe ... Create sepaeate archaeal/bacterial data retrieval predictions and merge data processing together...

Use pandas.set_option('display.max_columns', 7) when interactively playing with data in iPython, it'll make print DataFrame with more columns ...

Cherry investigation for Archaea resume: Beware, there are different type of data - for TrOp,noTrOp and ALL (translationally optimized organisms,non translationally optimized ones and their combination). What we've learned (we'll speak ALL by default) is that Akashi hypothesis holds true for all test cases, and is statistically significant one: protein "price" declines with CAI, while bootstrapped trend is flattened. IVYWREL is not statistically significant neither for archaea nor for bacteria, it shows decline both for real data and bootstraps as well. It is worth noting here though that IVYWREL decline for real data is obviously stronger than in the case of bootstrapped data (codon reshuffling). R20 for archaea appeared a more controversial feature: R20 declines for noTrOp, not stat.signif. though, which is kind if OK, because CAI does not mean much for noTrOp organisms. R20 almost flat/slightly increasing both for ALL and TrOp, while still declining for reshuffled codons, thus making R20 CAI irrelevance - statistically significant. That is something requiring deeper interpretation probably. At the current state, we are still able to resume that Cherry results are somewhat correct, while extremely subtle, however archaea case does not conform the Cherrie's conclusions (TEST arch_term.dat for hyperthermophiles instead). Yet the second conclusions stating that there enough dimensions in the sequence space to make Akashi trend with the R20 increase trends compatible: highly expressed must be cheap (Akashi), yet thermophilic organisms are made of more expensive proteins rather than their mesophilic counterparts, and by Cherry, highly expressed are similar with thermophiles from the amino acid usage perspective - how is that possible: Answer - there are enough dimensions in the sequence space to allow for this...

Clone this wiki locally