Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dvc pipelines #14

Open
wants to merge 42 commits into
base: master
Choose a base branch
from

Conversation

radistoubalidis
Copy link

@radistoubalidis radistoubalidis commented Jul 1, 2022

Signed-off-by: radis toubalidis rtoumpalidis@gmail.com

Description

The goal of this PR is :

  • create a dvc pipeline containing the stages as described in Standalone_GCBM\readme.txt
  • establish an action that when someone pushes a new run from a new branch is uses dvc to publish differences between the post-processing plots or other parameters.

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Additional Context (Please include any Screenshots/gifs if relevant)

Current stages:

tiler                       Outputs ..\..\logs\tiler_log.txt
recliner2gcbm_x64           Outputs logs\recliner_log.txt
add_species_vol_to_bio      Outputs logs\add_species_vol_to_bio.log
modify_root_parameters      Outputs logs\modify_root_parameters.log
modify_decay_parameters     Outputs logs\modify_decay_parameters.log
modify_turnover_parameters  Outputs logs\modify_turnover_parameters.log
modify_spinup_parameters    Outputs logs\modify_spinup_parameters.log
update_GCBM_configuration   Outputs ..\logs\update_gcbm_config.log
run_gcbm                    Outputs ..\logs\Moja_Debug.log
create_tiffs                Outputs ..\..\logs\create_tiffs.log, ..\..\processed_output\spatial
compile_results             Outputs ..\..\logs\compile_results.log
post_processing             Reports metrics\1900-1950_Deadwood_Tropical_Dry.json, metrics\1900-1950_Deadwoo…

DVC creates these files:

  • .dvc
  • dvc.lock --> it captures hashes of the dependencies (usually md5s)
  • dvc.yml --> which contains the stages of the pipeline

Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
…ifeZone

Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
@@ -0,0 +1,6 @@
[core]
remote = processed_output
['remote "gcbm_belize_logs"']

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment alerting users to configure this to their own storage?

1900,Total Biomass,Tropical Dry,0.0014951294179772342
1900,Total Biomass,Tropical Moist,6.140465519536419
1900,Total Biomass,Tropical Premontane Wet,6.079037757144233
1900,Deadwood,Tropical Dry,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@radistoubalidis - do you know why these tables have changed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know I'll look it up.

@@ -0,0 +1 @@
{"Deadwood, Tropical Dry": {"pool_tc_sum_MEAN": "15624317.95", "area_sum_MEAN": "1142790.38", "pool_tc_per_ha_MEAN": "13.67"}, "Deadwood, Tropical Moist": {"pool_tc_sum_MEAN": "15802289.91", "area_sum_MEAN": "608498.41", "pool_tc_per_ha_MEAN": "25.97"}, "Deadwood, Tropical Premontane Wet": {"pool_tc_sum_MEAN": "13011268.76", "area_sum_MEAN": "417245.77", "pool_tc_per_ha_MEAN": "31.18"}}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool. Does it refer to the mean over the lifetime of the simulation? Alternative/additional summary statistics might be start, mid- and endpoints. (e.g 1900, 1950, 2000) or decadal timestamps.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote a simple script that calculates the mean value of for each indicator for each LifeZone so we can use it for dvc metrics. But I didn't use it yet cause there were some inconcistencies.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool. Does it refer to the mean over the lifetime of the simulation? Alternative/additional summary statistics might be start, mid- and endpoints. (e.g 1900, 1950, 2000) or decadal timestamps.

Update : in b1fa0de I modified analyze.py to create a json metric file for start,mid and end endpoints .


REM Set Python path - change this to your Python installation directory.
set GCBM_PYTHON=C:\Python37
set GCBM_PYTHON=C:\Develop\Python\Python37

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use an environmental variable for this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored all the python paths in bat files in 5296555

dvc.yaml Outdated
@@ -0,0 +1,116 @@
stages:
tiler:
cmd: C:\Develop\Python\Python37\python.exe ..\..\tools\Tiler\tiler.py

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add python to PATH this could be cmd: python ..\..\tools\Tiler\tiler.py

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, even though I added python37 to my PATH this stage fails when I try to run it with python instead of the whole path. But I 'll note it on the pipeline readme

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update :
In 237d2a5 I added a vars list in dvc.yml where the local python path is included so it can be used as a var in the stages

Copy link

@aornugent aornugent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome, well done @radistoubalidis! The only thing missing is a README update to describe the new pipeline and steps to reproduce.

I'm not quite sure why the sensitivity figures have changed. It suggests that you're getting different results somehow. Or, alternatively, the problem could be that the post-processing pipeline is missing a grouping variable - this can sometimes cause the sawtooth pattern you see in the new figures. I'll double check the R code and get back to you.

@radistoubalidis
Copy link
Author

radistoubalidis commented Jul 2, 2022

This looks awesome, well done @radistoubalidis! The only thing missing is a README update to describe the new pipeline and steps to reproduce.

I'm not quite sure why the sensitivity figures have changed. It suggests that you're getting different results somehow. Or, alternatively, the problem could be that the post-processing pipeline is missing a grouping variable - this can sometimes cause the sawtooth pattern you see in the new figures. I'll double check the R code and get back to you.

I noticed it too about the figures, they're not the same with HEAD , I'll try to modify the pipeline so the outputs come out the same as in HEAD

UPDATE :
I have not verified that this is the reason why sensitivity figures have changed but ,I noticed that in run_all.bat the simulation start/end are different than the step by step workflow (e.g. in update_gcbm_configuration.bat they are [2010,2020] and in run_all.bat they are [1900,2050]

Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
…t to post_processing metrics in dvc

Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
@radistoubalidis
Copy link
Author

radistoubalidis commented Jul 5, 2022

@aornugent the changes after your review (updated):

  • In cb38bcd completed Notebook for ArchieIndex db
  • In 7944bde Completed dvc pipeline readme
  • In 237d2a5 created a var in dvc.yaml for the python path (in pipeline readme it explains how to modify to your to local path)
  • In b1fa0de I added metrics in the post_processing stage ([1900-1950], [1951-2000], [2001-2050])
  • In 5296555 I refactored all the python paths in .bat files same to master
  • In 3657937 I updated analyze.py to create a metrics json file for each period for each indicator for each lifezone . So we have :
    • 3 periods 1900-1950, 1951-2000, 2001-2050
    • 4 indicator types Deadwood, Litter, Soil Carbon, Total Biomass
    • 3 Lifezones Tropical Dry, Tropical Moist, Tropical Premontane Wet
  • In total 36 metrics json files with dvc metrics show output:
Path                                                          area_sum_mean    pool_tc_per_ha_mean    pool_tc_sum_mean
metrics\1900-1950_Deadwood_Tropical_Dry.json                  1142790.3823     8.46574                9674571.56838   
metrics\1900-1950_Deadwood_Tropical_Moist.json                608498.40961     19.67003               11969184.86104  
metrics\1900-1950_Deadwood_Tropical_Premontane_Wet.json       417245.76511     23.20835               9683586.49465   
metrics\1900-1950_Litter_Tropical_Dry.json                    1142790.3823     7.63421                8724307.19108   
metrics\1900-1950_Litter_Tropical_Moist.json                  608498.40961     15.89514               9672169.16877   
metrics\1900-1950_Litter_Tropical_Premontane_Wet.json         417245.76511     19.54048               8153183.07648   
metrics\1900-1950_Soil Carbon_Tropical_Dry.json               1142790.3823     18.18886               20786056.03202  
metrics\1900-1950_Soil Carbon_Tropical_Moist.json             608498.40961     69.2994                42168571.67542  
metrics\1900-1950_Soil Carbon_Tropical_Premontane_Wet.json    417245.76511     73.05651               30482518.25375  
metrics\1900-1950_Total Biomass_Tropical_Dry.json             1142790.3823     73.47453               83965986.06718  
metrics\1900-1950_Total Biomass_Tropical_Moist.json           608498.40961     128.64026              78277394.92039  
metrics\1900-1950_Total Biomass_Tropical_Premontane_Wet.json  417245.76511     131.66177              54935317.14342  
metrics\1951-2000_Deadwood_Tropical_Dry.json                  1142790.3823     15.45497               17661793.48354  
metrics\1951-2000_Deadwood_Tropical_Moist.json                608498.40961     26.51319               16133235.92611  
metrics\1951-2000_Deadwood_Tropical_Premontane_Wet.json       417245.76511     32.0702                13381156.01564  
metrics\1951-2000_Litter_Tropical_Dry.json                    1142790.3823     15.90142               18171994.72645  
metrics\1951-2000_Litter_Tropical_Moist.json                  608498.40961     27.23009               16569467.89581  
metrics\1951-2000_Litter_Tropical_Premontane_Wet.json         417245.76511     35.09762               14644334.00538
metrics\1951-2000_Soil Carbon_Tropical_Dry.json               1142790.3823     26.80044               30627284.39216
metrics\1951-2000_Soil Carbon_Tropical_Moist.json             608498.40961     76.61108               46617719.93404
metrics\1951-2000_Soil Carbon_Tropical_Premontane_Wet.json    417245.76511     81.71437               34094976.88782
metrics\1951-2000_Total Biomass_Tropical_Dry.json             1142790.3823     114.36661              130697056.55196
metrics\1951-2000_Total Biomass_Tropical_Moist.json           608498.40961     197.02668              119890421.62813
metrics\1951-2000_Total Biomass_Tropical_Premontane_Wet.json  417245.76511     207.90839              86748893.20024
metrics\2001-2050_Deadwood_Tropical_Dry.json                  1142790.3823     15.90271               18173464.67515
metrics\2001-2050_Deadwood_Tropical_Moist.json                608498.40961     29.2953                17826146.29171
metrics\2001-2050_Deadwood_Tropical_Premontane_Wet.json       417245.76511     36.33499               15160622.39384
metrics\2001-2050_Litter_Tropical_Dry.json                    1142790.3823     16.55106               18914397.27665
metrics\2001-2050_Litter_Tropical_Moist.json                  608498.40961     30.41606               18508123.58862
metrics\2001-2050_Litter_Tropical_Premontane_Wet.json         417245.76511     40.59053               16936228.41531
metrics\2001-2050_Soil Carbon_Tropical_Dry.json               1142790.3823     35.42651               40485069.6378
metrics\2001-2050_Soil Carbon_Tropical_Moist.json             608498.40961     86.52913               52652839.85377
metrics\2001-2050_Soil Carbon_Tropical_Premontane_Wet.json    417245.76511     94.05828               39245418.40241
metrics\2001-2050_Total Biomass_Tropical_Dry.json             1142790.3823     114.65017              131021114.07375
metrics\2001-2050_Total Biomass_Tropical_Moist.json           608498.40961     208.71101              127000315.77753
metrics\2001-2050_Total Biomass_Tropical_Premontane_Wet.json  417245.76511     223.22683              93140448.19134

Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
@radistoubalidis
Copy link
Author

Output of dvc metrics show

Path                                                                         area_sum_mean    pool_tc_per_ha_mean    pool_tc_sum_mean
Postprocessing\metrics\1900-1950_Deadwood_Tropical_Dry.json                  1142790.3823     9.02374                10312240.27057
Postprocessing\metrics\1900-1950_Deadwood_Tropical_Moist.json                608498.40961     21.44724               13050609.96778
Postprocessing\metrics\1900-1950_Deadwood_Tropical_Premontane_Wet.json       417245.76511     24.6368                10279599.85621
Postprocessing\metrics\1900-1950_Litter_Tropical_Dry.json                    1142790.3823     7.51629                8589542.73961
Postprocessing\metrics\1900-1950_Litter_Tropical_Moist.json                  608498.40961     15.67057               9535513.88722
Postprocessing\metrics\1900-1950_Litter_Tropical_Premontane_Wet.json         417245.76511     19.05744               7951634.95976
Postprocessing\metrics\1900-1950_Soil Carbon_Tropical_Dry.json               1142790.3823     18.27207               20881145.04849
Postprocessing\metrics\1900-1950_Soil Carbon_Tropical_Moist.json             608498.40961     69.44361               42256326.8329
Postprocessing\metrics\1900-1950_Soil Carbon_Tropical_Premontane_Wet.json    417245.76511     73.32265               30593565.30046
Postprocessing\metrics\1900-1950_Total Biomass_Tropical_Dry.json             1142790.3823     73.47453               83965986.06718
Postprocessing\metrics\1900-1950_Total Biomass_Tropical_Moist.json           608498.40961     128.64026              78277394.92039
Postprocessing\metrics\1900-1950_Total Biomass_Tropical_Premontane_Wet.json  417245.76511     131.66177              54935317.14342
Postprocessing\metrics\1951-2000_Deadwood_Tropical_Dry.json                  1142790.3823     16.68079               19062652.0933
Postprocessing\metrics\1951-2000_Deadwood_Tropical_Moist.json                608498.40961     28.58334               17392918.97526
Postprocessing\metrics\1951-2000_Deadwood_Tropical_Premontane_Wet.json       417245.76511     33.75245               14083067.18984
Postprocessing\metrics\1951-2000_Litter_Tropical_Dry.json                    1142790.3823     15.67846               17917198.28353
Postprocessing\metrics\1951-2000_Litter_Tropical_Moist.json                  608498.40961     26.8251                16323028.91383
Postprocessing\metrics\1951-2000_Litter_Tropical_Premontane_Wet.json         417245.76511     34.24219               14287409.81653
Postprocessing\metrics\1951-2000_Soil Carbon_Tropical_Dry.json               1142790.3823     26.90024               30741339.14617
Postprocessing\metrics\1951-2000_Soil Carbon_Tropical_Moist.json             608498.40961     76.80424               46735260.33606
Postprocessing\metrics\1951-2000_Soil Carbon_Tropical_Premontane_Wet.json    417245.76511     82.09022               34251795.1364
Postprocessing\metrics\1951-2000_Total Biomass_Tropical_Dry.json             1142790.3823     114.36661              130697056.55196
Postprocessing\metrics\1951-2000_Total Biomass_Tropical_Moist.json           608498.40961     197.02668              119890421.62813
Postprocessing\metrics\1951-2000_Total Biomass_Tropical_Premontane_Wet.json  417245.76511     207.90839              86748893.20024
Postprocessing\metrics\2001-2050_Deadwood_Tropical_Dry.json                  1142790.3823     16.23329               18551242.10796
Postprocessing\metrics\2001-2050_Deadwood_Tropical_Moist.json                608498.40961     31.16315               18962725.98408
Postprocessing\metrics\2001-2050_Deadwood_Tropical_Premontane_Wet.json       417245.76511     37.4451                15623808.56073
Postprocessing\metrics\2001-2050_Litter_Tropical_Dry.json                    1142790.3823     14.93973               17072980.27577
Postprocessing\metrics\2001-2050_Litter_Tropical_Moist.json                  608498.40961     27.97692               17023913.2908
Postprocessing\metrics\2001-2050_Litter_Tropical_Premontane_Wet.json         417245.76511     36.8938                15393783.85856
Postprocessing\metrics\2001-2050_Soil Carbon_Tropical_Dry.json               1142790.3823     33.60477               38403209.25278
Postprocessing\metrics\2001-2050_Soil Carbon_Tropical_Moist.json             608498.40961     84.92637               51677560.85901
Postprocessing\metrics\2001-2050_Soil Carbon_Tropical_Premontane_Wet.json    417245.76511     92.26118               38495586.80793
Postprocessing\metrics\2001-2050_Total Biomass_Tropical_Dry.json             1142790.3823     104.68105              119628496.69294
Postprocessing\metrics\2001-2050_Total Biomass_Tropical_Moist.json           608498.40961     194.4306               118310712.98961
Postprocessing\metrics\2001-2050_Total Biomass_Tropical_Premontane_Wet.json  417245.76511     207.51839              86586169.15699

…e order

Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
@radistoubalidis
Copy link
Author

radistoubalidis commented Jul 14, 2022

In fd9b236
I updated add_species_vol_to_bio.py and the modify_<type>_parameters.py scripts to store logging messages in a file in /log directory.

This is needed because dvc as a default does not define an order in the pipeline stages , it does it only if for each i-th stage with output x its next one i+1-th has x as a dependency.By creating a log file for each stage we achieve pipeline execution in order.

Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
@aornugent
Copy link

@radistoubalidis or @aldeav - can we please update this branch to develop? @aldeav has fixed a bug in the postprocessing code to remedy the saw-tooth lines in the output figures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants