In this repository, we apply the phenotypic profiling model, which predicts phenotypic class of single cells using nuclei features, to the JUMP-Target pilot data from the JUMP consortium.
In this dataset, there are 51 plates with one of three perturbation types (Clustered Regularly Interspaced Short Palindromic Repeats [CRISPR], Open Reading Frame [ORF], and Compound) for two cell lines (A549 and U2OS).
Each perturbation type has it's own platemap and metadata file that can be found in the reference_plate_data folder. A barcode platemap is include which associates each plate to the correct platemap file.
We segment a total of 20,959,860 single cells in all plates.
To reproduce this project, please ensure adequate storage as the CellProfiler SQLite database files are approximately 1.1 TB.
Traditional image-based profiling pipelines aggregate single-cells into well-level profiles. While, this process removes outliers that might dampen signal, it also removes potentially interesting biologically-meaningful heterogeneity.
By predicting single-cell phenotypes with our phenotypic profiling model, we hope to uncover important patterns of biology that would be missed with the traditional methodology. Specifically, the benefits of single-cell phenotyping include:
- Granular phenotypic mechanisms of perturbations regarding (A) the impact perturbations have on a specific phenotype (e.g., disrupting mitosis) and (B) impact on phenotype prevalence (e.g., a gene knockout that causes apoptosis or stalls cells in a specific cell cycle phase).
- Filter and/or combine cells of the same phenotypic class to purify and/or improve the traditional image-based profiling pipeline.
- Adding knowledge to specific combinations of morphology features allows for self-referential interpretation, without the need for database signature lookup or other guilt-by-association methods.
- When combined with different experimental designs (e.g., targeted fluorescence marker), we can test specific hypotheses regarding single-cell phenotype distributions (and other important hypotheses that would otherwise be impossible without single-cell phenotypes).
Module | Purpose | Description |
---|---|---|
0.download_data | Download JUMP-Target SQLite files | We downloaded the CellProfiler SQLite outputs for 51 plates from AWS |
1.process_data | Process SQLite files | We use pycytominer on the SQLite outputs to merge single-cells and normalize features |
2.evaluate_data | Apply phenotypic profiling model | We generate phenotypic predictions for single-cells using the phenotypic profiling model |
3.analyze_data | Analyze phenotypic predictions | We perform multiple analyses to validate the phenotypic predicted class for each perturbation compared to control |
reference_plate_data | Platemaps per perturbation type | This folder holds the platemap files with metadata based on perturbation type and the barcode platemap file |
For all modules, we use one environment that includes all necessary packages.
To create the environment from terminal, run the code line below:
# Make sure you are in the same directory as the environment file
conda env create -f environment.yml