GitHub - IBPA/OPEX: An optimal experimental design framework for accelerating knowledge discovery using gene expression data

What is OPEX?

OPEX is an optimal experimental design framework written in R to help biologists to select the most informative experiments to conduct given the experiments conducted up to now. This repo demonstrates the application of OPEX on collecting gene expression data of E. coli under the stress of various antibiotics and biocides.

Dependencies

mlegp

Code architecture

The structure of the code is show as follows. The entry to this project is run.sh which runs the main.R. The folder, src stores the implementation of the functions and classes used in main.R. There are seven R scripts in src. The script, generate_setting.R is for generating settings for running a simulation. The script, Simulator.R defines a class named Simulate, which is the workhorse of running the simulation. Other scripts are helper modules of the Simulate class. For details of each script, see the document header of each file.

├── main.R
├── run_OPEX_on_your_dataset.R
├── run.sh
└── src
    ├── add_noise.R
    ├── generate_setting.R
    ├── max_dist.R
    ├── prepare_data.R
    ├── screen_index_helper.R
    ├── Simulator.R
    └── update_train_pool.R

Input data

The input data is a table, in which the first 14 columns define the culture conditions in each row and the other 1123 columns represents the gene expression profile for each condition. (Genes that did not have a sufficient sequencing depth were excluded).

A culture condition is defined by a binary vector, representing the presence (with 0) or absence (with 1) of 10 biocides and 4 antibiotics: Chlorexidine, Phenol, H2O2, Isopropanol, Bezalkonium_chloride, Ethanol, Glutaraldehyde, Percetic_acid, Sodium_hypochlorite, Povidone_iodine, Kanamycin, Rifampicin, Norfloxacin, Ampicillin.

How to reproduce

Step 1: generate a file that include the settings for running OPEX. The setting file is named after the sampling method. e.g. expert sampling is used in the following example.
```
cd ./R/src
Rscript generate_setting.R setting
```
After running the above commands, a file named setting.csv is generated in ./output. The generate_setting.csv specifies the value for hyper-parameters: random_seed, exploration frequency, adaptive , start size, add, dataset id, noise, iter_num, and sampling method. For the meaning of these hyper-parameters, see the comments in the generate_setting.R file.
Step 2: Run the simulation using one of the setting in the file generated in Step 1. e.g. The first setting is used in the following example.
```
cd ./R
bash run.sh setting.csv 1
```
To run all the settings, we used high performance computing. The script for submitting all the simulations is as follows:

#!/bin/bash
#SBATCH -p low
#SBATCH -N 1
#SBATCH -n 1
#SBATCH --mem-per-cpu 1000
#SBATCH -t 1:00:00
#SBATCH -o output/slurm.%N.%j.out
#SBATCH -e output/slurm.%N.%j.err
#SBATCH --array=1-1800
Rscript main.R setting.csv $SLURM_ARRAY_TASK_ID

Upon completion, a folder named setting will be created in ./output. The results generated by this simulation run is stored in the folder, expert_sample.

The result is a csv file named by the value of the hyper-parameters in the setting and contains the order of each culture condition selected by expert sampling.

How to run OPEX on your own tabular dataset

To OPEX on your own biological problem, two tabular datasets are needed. One is a dataset for training a model. The other is a pool of candidate experiments to run. Both datasets are a matrix. In the training dataset, the last column is the output and other columns are inputs. Each row denotes one datapoint. The pool dataset has one less column than the training set as the output column is missing.

The command to run OPEX is as follows:

Rscript run_OPEX_on_your_dataset.R <training_path> <pool_path> <batch_size>

training_path, pool_path are two strings representing the path of two csv files.

batch_size is an integer.

Support

If you have any questions about this project, please contact us at tagkopouloslab@ucdavis.edu

Licence

See the LICENSE file for license rights and limitations (Apache2.0).

Acknowledgement

This work was supported by an NSF award (#1743101).

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
R		R
code_for_fig		code_for_fig
data		data
output		output
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R

R

code_for_fig

code_for_fig

data

data

output

output

LICENSE

LICENSE

README.md

README.md

Repository files navigation

What is OPEX?

Dependencies

Code architecture

Input data

How to reproduce

How to run OPEX on your own tabular dataset

Support

Licence

Acknowledgement

About

Releases

Packages

Contributors 3

Languages

License

IBPA/OPEX

Folders and files

Latest commit

History

Repository files navigation

What is OPEX?

Dependencies

Code architecture

Input data

How to reproduce

How to run OPEX on your own tabular dataset

Support

Licence

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Languages