Deep probabilistic CCA

Code for End-to-end training of deep probabilistic CCA on paired biomedical observations.

Abstract

Medical pathology images are visually evaluated by experts for disease diagnosis, but the connection between image features and the state of the cells in an image is typically unknown. To understand this relationship, we develop a multimodal modeling and inference framework that estimates shared latent structure of joint gene expression levels and medical image features. Our method is built around probabilistic canonical correlation analysis (PCCA), which is fit to image embeddings that are learned using convolutional neural networks and linear embeddings of paired gene expression data. Using a differentiable take on the EM algorithm, we train the model end-to-end so that the PCCA and neural network parameters are estimated simultaneously. We demonstrate the utility of this method in constructing image features that are predictive of gene expression levels on simulated data and the Genotype-Tissue Expression data. We demonstrate that the latent variables are interpretable by disentangling the latent subspace through shared and modality-specific views.

Installation

While all the dependencies used for the paper are listed in environment.yml, these are operating system-specific; and some library versions (e.g. libcxx=4.0.1) will not be available across systems. However, you can build everything you need with

python       3.7
pytorch      1.0.1
torchvision  0.2.2
numpy        1.16.2
scikit-learn 0.20.2
scipy        1.2.1
matplotlib   3.0.2

You'll need nose2 to run the unit tests. Create and activate a conda environment,

conda create -n dpcca python=3.7
conda activate dpcca

and then install these dependencies, e.g. conda install pytorch=1.0.1 -c pytorch.

[Optional] Run the unit tests. Note that these occasionally fail due to numerical tolerances:

```bash
bash run_tests.sh

Reproducing multimodal MNIST results

Generate the multimodal MNIST data set.

python -m data.mnist.generate

Create directories for experiments:

mkdir experiments experiments/example

Run the code:

python traindpcca.py

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
models		models
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cuda.py		cuda.py
environment.yml		environment.yml
linalg.py		linalg.py
pprint.py		pprint.py
run_tests.sh		run_tests.sh
traindpcca.py		traindpcca.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

models

models

tests

tests

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

cuda.py

cuda.py

environment.yml

environment.yml

linalg.py

linalg.py

pprint.py

pprint.py

run_tests.sh

run_tests.sh

traindpcca.py

traindpcca.py

Repository files navigation

Deep probabilistic CCA

Abstract

Installation

Reproducing multimodal MNIST results

About

Releases

Packages

Languages

License

gwgundersen/dpcca

Folders and files

Latest commit

History

Repository files navigation

Deep probabilistic CCA

Abstract

Installation

Reproducing multimodal MNIST results

About

Resources

License

Stars

Watchers

Forks

Languages