Predict precipitation at ARM TWP-C1 an hour later (tried 6 hours later as well) using in situ measurement data
- Input data: T_p, rh_p, u_p, v_p, prec_sfc, (t_cos, t_sin)
- Method: NN (4 hidden layers), RF+NN
- Problem: it tries to predict most events as no precipitation events
- Suspect: current input data cannot fully capture the dynamics, and data is imbalanced (precip vs no precip, hour imbalance)
- All .ipynb files are in colab/.
- From Run 02.*, run version includes a .* at the end.
- From Run 04.*, run version *.0 is reserved for testground.
extract_data.py: raw DataSet in .nc / .cdf -> var of interest DataSet in .cdf
netcdf-flattening.ipynb <- netcdf-flattening.py: var of interest in .cdf -> flattened two-dimensional (pandas) DataFrame in .csv, append the next hour precipitation as labels
netcdf-flattening-6-hour-cumulative-precip.ipynb: ditto, but apeend the next 6-hour cumulative precipitation as labels
RF-1hrlater.ipynb: append RF-predicted class onto dataset in .csv. Because of lack of disk quota, I cannot install more packages in the virtual environment. I have requested for more disk quota. (14 Dec 2018)
- Code: SVM-1hrlater.ipynb
- DATADIR = ARM_1hrlater.csv
- Classification Threshold = 0.1
- train_size = 0.6
- Rainy period ratio = 0.1659/ 0.4869 - blind test accuracy = 0.8341/ 0.5131
- test accuracy = 0.8922/ 0.4(bad)
- plt.plot = 1D True precipitation plots for both classes separately
- Code: SVM-6hrcumul.ipynb
- DATADIR = ARM_6hrcumul.csv
- Classification Threshold = 0.3002
- train_size = 0.6
- Rainy period ratio = 0.4995 - blind test accuracy = 0.5
- test accuracy = 0.4672
- plt.plot = None
- Code: RF-1hrlater.ipynb
- DATADIR = ARM_1hrlater.csv
- Classification Threshold = 0.1/ 0/ 0.05
- train_size = 0.6
- Rainy period ratio = 0.1659/ 0.4869/ 0.3183 - blind test accuracy = 0.8341/ 0.5131/ 0.6817
- test accuracy = 0.9/ 0.85/ 0.88 !!!
- plt.plot = 1D True precipitation plots for both classes separately
Abs loss is de-normalized, and is not used as a loss metric. Other regression losses are normalized.
- Code: NN.py
- DATADIR = ARM_1hrlater.csv
- train_size = 0.75
- num_epoch = 100000
- n_hid = [n_in = 151, 128, 64, 32, 16, n_out = 1]
- run_ID = 01
- connections = ['fc'] #, 'bn', 'do'
- act_funcs = ['relu', 'leaky_relu']
- loss_funcs = ['square', 'quartic']#, 'huber']
- learning_rates = [1e-2, 1e-3, 1e-4]#, 1e-5, 1e-6]
- plt.plot = True precipitation vs Predicted precipitation
- LeakyReLU-sqloss-1e-3 mean abs loss = 1.131 < other config, tends to all collapse to zero due to imbalanced data
- Code: NN_after_RF_1hr.py
- DATADIR = ARM_1hrlater_RFclassified.csv; ARM_1hrlater_RFclassified_threshold_0.05.csv
- train_size = 0.6 - have to follow RF config in RF-1hrlater.ipynb
- num_epoch = 100000
- n_hid = [n_in = 151, 128, 64, 32, 16, n_out = 1]
- run_ID = 04.1; 05.1
- connections = ['fc']#, 'bn', 'do']
- act_funcs = ['relu', 'leaky_relu']
- loss_funcs = ['square', 'quartic']
- learning_rates = [1e-3, 1e-4]
- plt.plot = True precipitation vs Predicted precipitation in 2 colours (each for each RF class)
- r04.1: threshold = 0, ReLU-sqloss-1e-3 mean abs loss = 0.8082 < other config
- r05.1: threshold = 0.05, ReLU-sqloss-1e-3 mean abs loss = 0.9153 < other config
- Code: NN_cumul_class.py
- DATADIR = ARM_6hrcumul.csv
- Classification Threshold = 0.31
- train_size = 0.6
- n_hid = [n_in = 151, n_out = 1]
- num_epoch = 3000
- run_ID = 02.0; 02.1; 02.2
- connections = ['fc']
- act_funcs = ['log_reg']#['leaky_relu','relu']
- loss_funcs = ['xent','hinge']#,'square']
- learning_rates = [1e-3]#, 1e-5, 1e-6]
- plt.plot = True precipitation vs Probability of Raining
- r02.0: n_hid = [n_in = 151, 16, 4, n_out = 1] ReLU-Hinge accuracy = 0.5219 > other config
- r02.1: n_hid = [n_in = 151, 16, 4, n_out = 1] accuracy < 0.5 sucks
- r02.2: Hinge==linSVM accuracy = 0.5525 > 0.5473 = xEnt==LogReg accuracy
Log reg (r03.0)/ linear SVM (r03.0)/ simple 1-hid-layer NN (r03.1) classifies if it is rainy the next hour
- Code: NN_1hr_class.py
- DATADIR = ARM_1hrlater.csv
- Classification Threshold = 0.1
- train_size = 0.6
- n_hid = [n_in = 151, (5), n_out = 1] - the hid layer exists in some runs only
- num_epoch = 3000
- run_ID = 03.0; 03.1
- connections = ['fc']
- act_funcs = ['leaky_relu','relu'] #'lr-svmlin']
- loss_funcs = ['xent','hinge']#,'square'] - xent and hinge corresponds to logistic reg and linear SVM resp. when no hid layers (r03.0)
- learning_rates = [1e-3]#, 1e-5, 1e-6]
- plt.plot = 1D True precipitation plots for both classes separately
- r03.0: Hinge==linSVM accuracy = 0.8938 > 0.8871 = xEnt==LogReg accuracy
- r03.1: 1hd-ReLU-Hinge accuracy = 0.8756 > other 1hd config