Environmental conditions drive self-organization of reaction pathways in a prebiotic reaction network
Data and analysis for the "Environmental conditions drive self-organization of reaction pathways in a prebiotic reaction network" (Robinson, Daines, van Duppen, de Jong, Huck, Nature Chemistry, 2022, 14, 623–631).
The evolution of life from the prebiotic environment required a gradual process of chemical evolution towards greater molecular complexity. Elaborate prebiotically-relevant synthetic routes to the building blocks of life have been established. However, it is still unclear how functional chemical systems evolved with direction using only the interaction between inherent molecular chemical reactivity and the abiotic environment. Here, we demonstrate how complex systems of chemical reactions exhibit well-defined self-organisation in response to varying environmental conditions. This self-organisation allows the compositional complexity of the reaction products to be controlled as a function of factors such as feedstock and catalyst availability. We observe how Breslow's cycle contributes to the reaction composition by feeding C2 building blocks into the network, alongside reaction pathways dominated by formaldehyde-driven chain growth. The emergence of organised systems of chemical reactions in response to changes in the environment offers a potential mechanism for a chemical evolution process that bridges the gap between prebiotic chemical building blocks and the origin of life.
Contains a compound numbering scheme for the manuscript (compound_numbering.txt) and general compound information (compound_properties.csv).
Contains folders named according to experiment codes containing data reports of compound concentrations determined from GC-MS and HPLC data.
Contains the average concentrations (AverageData.csv) and concentration amplitudes (AmplitudeData.csv) determined for each experiment.
Contains experimental conditions (Experiment_parameters.csv) for each experiment, and selected experiment code series corresponding to systematic variations in conditions (Series_info.csv).
Contains a full list of rule-generated formose reactions (FullFormoseReaction.txt) and the reaction classes assigned to each reaction (reaction_class_assignments.json).
Contains notes on the data analysis.
A container for plotting data (note: .png files are ignored by the git repository). This should be empty.
Contains a set of reaction SMARTS strings used to generate the formose reaction (reaction_SMARTS_templates.tsv). These reaction SMARTS strings define reaction classes.
Lists of reactions for each modulated experiment in as reaction SMILES strings. Files are named according to experiment code.
Python scripts used to analyse and plot the data. It is possible to run each script individually, provided that the correct data files are present in the repository. Reading and writing of files is designed to work within the directory structure of this repository. The primary source of data with respect to the analysis in this repository are the data files in DATA/DATA_REPORTS
and the reaction rules outlined in REACTION_INFO/reaction_smarts_templates.csv
. All other data files are derived from these sources using the scripts in SCRIPTS/DATA_PREP
.
The scripts in SCRIPTS/DATA_PRESENT
are for data visualisation purposes (for example, for inclusion as figures in the main text and supplementary information). They convert the various data files into graphical formats.
The programs contained in this repository require Python 3.9.2 and the rdkit, numpy, scipy, networkx, pandas matplotlib and sklearn libraries (see environment.yaml
).
This software should work on all systems capable of installing the dependencies described above. It has been run successfully on MacOS (10.15) and Windows (Windows 10) machines.
A typical installation time should take less than 1 hour. The software can be installed as follows:
Clone the repository from GitHub. If you're having trouble with cloning, just download the zip file and store it on your computer wherever you would like.
Anaconda/Miniconda (conda
) or pip
can be used to install the software dependencies. We encourage the use of conda, as it is much easier to install The RDKit through this package manager. A list of all of the packages installed in the environment used to run these scripts is given in environment.yaml
. Manual installation instructions are given below (recommended).
First, create a virtual environment:
Create a virtual environment with conda:
conda create --name name-env python=3.9.2
Activate the virtual environment:
Mac:
conda activate name-env
Windows:
activate name_of_my_env
Go to Install dependencies.
Create a virtual environment:
pip install virtualenv
virtualenv name-env
Activate the virtual environment:
Mac:
source name-env/bin/activate
Windows:
name-env\Scripts\activate
Go to Install dependencies.
Use pip
or conda
to install the following dependencies.
e.g. using conda:
conda install -c anaconda scipy
pip install networkx
conda install matplotlib
conda install pandas
conda install -c conda-forge rdkit rdkit
conda install -c anaconda scikit-learn
pip install graphviz
conda install --channel conda-forge pygraphviz
First, download or clone a copy of the NorthNet code (please use the v0.1 release). In the command line/terminal, navigate to the repository directory and run:
conda:
conda develop NorthNet
(you may have to run conda install conda-build
first)
pip:
- Install from the repository root using pip:
pip install .
, - Or in editable mode (so edits are immediately reflected):
pip install -e .
Each of the programs should not take more than a few minutes to run on a modern laptop.