Skip to content

theislab/chemCPA

Repository files navigation

Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution

Code accompanying the NeurIPS 2022 paper (PDF).

architecture of CCPA

Our talk on chemCPA at the M2D2 reading club is available here. A previous version of this work was a spotlight paper at ICLR MLDD 2022. Code for this previous version can be found under the v1.0 git tag.

Codebase overview

For the final models, we provide weight checkpoints as well as the hyperparameter configuration. The raw datasets can be downloaded from a FAIR server. We also provide our processed datasets for reproducibility: sci-Plex shared gene set & extended gene set, LINCS. Embeddings can be downloaded here.

To setup the environment, install conda and run:

conda env create -f environment.yml
python setup.py install -e .
  • chemCPA/: contains the code for the model, the data, and the training loop.
  • embeddings: There is one folder for each molecular embedding model we benchmarked. Each contains an environment.yml with dependencies. We generated the embeddings using the provided notebooks and saved them to disk, to load them during the main training loop.
  • experiments: Each folder contains a README.md with the experiment description, a .yaml file with the seml configuration, and a notebook to analyze the results.
  • notebooks: Example analysis notebooks.
  • preprocessing: Notebooks for processing the data. For each dataset there is one notebook that loads the raw data.
  • tests: A few very basic tests.

All experiments where run through seml. The entry function is ExperimentWrapper.__init__ in chemCPA/seml_sweep_icb.py. For convenience, we provide a script to run experiments manually for debugging purposes at chemCPA/manual_seml_sweep.py. The script expects a manual_run.yaml file containing the experiment configuration.

All notebooks also exist as Python scripts (converted through jupytext) to make them easier to review.

Some of the notebooks use a drugbank_all.csv file, which can be downloaded from here (registration needed).

Citation

You can cite our work as:

@inproceedings{hetzel2022predicting,
  title={Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution},
  author={Hetzel, Leon and Böhm, Simon and Kilbertus, Niki and Günnemann, Stephan and Lotfollahi, Mohammad and Theis, Fabian J},
  booktitle={NeurIPS 2022},
  year={2022}
}