MR. NODE

MR. NODE (Multiple predictoR Neural ODE) is a deep learning method that models the infection rate of black Sigatoka via ordinary differential equations, and which can infer the infection risk variable at an arbitrary point on the timeline.

This is the submission of the University of Toronto team to ProjectX 2020, an international machine learning research competition.

.
├── baseline          # All files related to baseline models
├── mr_node           # Data structures for MR. NODE
├── train.py          # Training script for MR. NODE
├── test.py           # Testing script for MR. NODE 
└── data              # Time series data for Costa Rica and India.

Data

We have collected microclimatic data from India and Costa Rica, two regions of the world known for having vast banana plantations and synthesized the corresponding infection risk variable via a probabilistic survival process inspired by [2].

Region	Latitude	Longitude
Costa Rica	10.39	-83.812
Maharashtra, India	18.8143	73.125

Installation

Install Poetry.
Clone this repository and cd into its directory.
Install the project and run the training script in the right environment.

$ poetry install
$ poetry shell
$ python train.py

MR. NODE

Training

You may use the following command to train the model. The results can be found in /results.

$ python train.py --region=cr --solver=euler --lr=3e-4 --encoder_fc_dims 8 16 8 --hidden_dims=4 --odefunc_fc_dims 64 64 --decoder_fc_dims 64 64 --window_length=128 --num_epochs=1 --rtol=1e-4 --atol=1e-6

Keyword arguments:

region: Whether to train using data from Costa Rica (cr), India (in), or both (crin). Default: cr
solver: ODE solver (see torchdiffeq for the complete list). Default: euler
lr: learning rate. Default: 3e-4
encoder_fc_dims: Fully-connected layers in the encoder. Default: 8 16 8
hidden_dims: Dimensions of latent space. Default: 4
odefunc_fc_dims: Fully-connected layers in the dynamics function. Default: 64 64
decoder_fc_dims: Fully-connected layers in the decoder. Default: 8 16 8
window_length: Window length for time steps. Default: 128
num_epochs: Number of training epochs. Default: 1
rtol: Relative tolerance for Neural ODE. Default: 1e-4
atol: Absolute tolerance for Neural ODE. Default: 1e-4

Testing

Training a model with a set of arguments will generate a .pt file in /results/models uniquely identified by a job_id created based on the training arguments. You may use this job_id to specify which model to test.

$ python test.py --region=cr --job_id='cr_euler_lr3.0e-04_enc[8, 16, 8]_hidden4_ode[64, 64]_dec[8, 16, 8]_window128_epochs1_rtol0.0001_atol1e-06' --plot_indiv=False --num_to_keep=100

Keyword arguments:

region: Whether to test using data from Costa Rica (cr), India (in), or both (crin). Default: cr
job_id: Job id of the model to test. Default: cr_euler_lr3.0e-04_enc[8, 16, 8]_hidden4_ode[64, 64]_dec[8, 16, 8]_window128_epochs1_rtol0.0001_atol1e-06
plot_indiv: Whether or not to generate individual plots in results/plots. If not, all the plots will be created on a single image file. Default: False
num_to_keep: Number of time steps to use to create the initial latent state. This must be a positive integer no greater than 100. Default: 100

Baseline RNN and LSTM

Training

You may use the following command to train the baseline RNN or LSTM model. The results can be found in /baseline/baseline_results.

$ cd baseline
$ python train_baseline.py --region=cr --lr=0.001 --batch_size=256 --seq_len=100 --num_epochs=1 --n_hidden=20 --model_name=lstm

Keyword arguments:

region: Whether to train using data from Costa Rica (cr), India (in), or both (crin). Default: cr
lr: learning rate. Default: 0.001
batch_size: batch size. Default: 256
seq_len: Number of ground-truth points to use when extrapolating. This must be a positive integer no greater than 100. Default: 100
num_epochs: Number of training epochs. Default: 1
n_hidden: Number of hidden units in the RNN/LSTM. Default: 20
model_name: Can be lstm or rnn. Default: lstm

Testing

Training a model with a set of arguments will generate a .pt file in /baseline/baseline_results/models uniquely identified by a job_id created based on the training arguments. You may use this job_id to specify which model to test.

$ cd baseline
$ python test_baseline.py --region=cr --job_id='cr_lstm_lr1.0e-03_batch256_seq100_epochs1_hidden20'

Keyword arguments:

region: Whether to test using data from Costa Rica (cr), India (in), or both (crin). Default: cr
job_id: Job id of the model to test. Default: cr_lstm_lr1.0e-03_batch256_seq100_epochs1_hidden20

References

[1] Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural Ordinary Differential Equations. 2018. https://arxiv.org/abs/1806.07366.
[2] Daniel P. Bebber. Climate change effects on Black Sigatoka disease of banana. May 2019. https://doi.org/10.1098/rstb.2018.0269.

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
.vscode		.vscode
baseline		baseline
data		data
images		images
mr_node		mr_node
.flake8		.flake8
.gitignore		.gitignore
.pylintrc		.pylintrc
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
sbatch_train.py		sbatch_train.py
test.py		test.py
test_in.py		test_in.py
train.py		train.py

UofTrees/ProjectX2020

Folders and files

Latest commit

History

Repository files navigation

MR. NODE

Data

Installation

MR. NODE

Training

Testing

Baseline RNN and LSTM

Training

Testing

References

About

Resources

Stars

Watchers

Forks

Languages