Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type error in GCBM API causes (fatal) - main(306) - Value too large. error. #136

Open
padmajabhol opened this issue Jun 22, 2022 · 22 comments · May be fixed by #197
Open

Type error in GCBM API causes (fatal) - main(306) - Value too large. error. #136

padmajabhol opened this issue Jun 22, 2022 · 22 comments · May be fixed by #197
Assignees
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@padmajabhol
Copy link
Member

Currently, the simulation runs without any blockers on Linux environments, we are open to try all possible fix that might help us extend it to all os environments.
cc: @aornugent @HarshCasper

@padmajabhol padmajabhol added the help wanted Extra attention is needed label Jul 13, 2022
@aornugent
Copy link
Contributor

Here is the nightly from Jun 22nd: https://github.com/moja-global/FLINT.Cloud/pkgs/container/rest_api_gcbm/26346728

We can work backwards through the builds using bisection to quickly find the previous known working state.

@aornugent aornugent added the good first issue Good for newcomers label Jul 23, 2022
@aornugent aornugent changed the title Fix REST_API_GCBM 'value too large' error. Review rest_api_gcbm builds for (fatal) - main(306) - Value too large. error. Jul 23, 2022
@aornugent
Copy link
Contributor

Here's an earlier description of the problem: #111 (comment)

It might be differences in instruction set. @radistoubalidis can we set a specific architecture in our Github Actions?

@aornugent
Copy link
Contributor

@HarshCasper or @Simpleshell3 or @Janvi-Thakkar or @SanjaySinghRajpoot - can you confirm whether the latest flint.gcbm container runs the GCBM Demo Run on the Azure VM listed here: https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#cloud-hosts-used-by-github-hosted-runners

I'd like to identify whether the issue is caused by a bug in our software or the environment it's running in.

@aornugent aornugent changed the title Review rest_api_gcbm builds for (fatal) - main(306) - Value too large. error. Type error in GCBM API causes (fatal) - main(306) - Value too large. error. Aug 6, 2022
@aornugent
Copy link
Contributor

aornugent commented Aug 6, 2022

@padmajabhol @Namyalg

I think I know what's causing this. After checking a few of our old builds, I decided to try revisit the old GCBM Demo Run and found that it worked! Well - it worked after a few tweaks (#165).

So with a working reference, I generated the run4 test simulation using the REST API, following the commands in curl.md. After a little re-organising of folders and filenames, I was able to compare the two configurations using meld input/run4 tests/linux-demo.

What I found was a teeeny tiny difference in the configuration of every input file. In the diff for Classifier1.json we see that the nodata value is an int in the API generated config and a double in the reference simulation.

--- a/input/run4/Classifier1.json
+++ b/tests/linux-demo/Classifier1.json
@@ -1,21 +1,21 @@
{
    "layer_type": "GridLayer",
    "layer_data": "Byte",
-   "nodata": 255,
+   "nodata": 255.0,
    "tileLatSize": 1.0,
    "tileLonSize": 1.0,
    ...
}

Type safety is something that's bitten us before with FLINT - I remember @shubhamkarande13 coming up with all sorts of funky work-arounds to send type-safe JSON from the flint-ui. Now that we've got a stronger backend server we should have more control over how the auto-generated JSON gets formatted.

Can I please leave it to you to propose a hot-fix for the master branch (L328)? Please let me know if you need help verifying the result. Appreciate your help!

P.S. For reference, for any new contributors wondering how to make this change, these are the steps I took to test both simulations:

linux-demo

# git clone https://github.com/moja-global/flint.cloud
# git checkout linux-test
cd flint.cloud/local/rest_api_gcbm
docker-compose up -d
cd tests
unzip linux-demo.zip
docker exec -it flint.gcbm /bin/bash

# inside the container
cd tests/config
/opt/gcbm/moja.cli --config_file=gcbm_config.cfg --config_provider=provider_config.json

run4

# git checkout master
unzip GCBM_New_Demo_Run.zip
mv GCBM_New_Demo_Run local/rest_api_gcbm
flint.cloud/local/rest_api_gcbm

# removing `-d` means we see the console log
docker-compose up

# in a new terminal
cd flint.cloud/local/rest_api_gcbm/GCBM_New_Demo_Run

# from curl.md
curl -F disturbances='@disturbances/disturbances_2011_moja.tiff' \
      -F disturbances='@disturbances/disturbances_2012_moja.tiff' \
      -F disturbances='@disturbances/disturbances_2013_moja.tiff' \
      -F disturbances='@disturbances/disturbances_2014_moja.tiff' \
      -F disturbances='@disturbances/disturbances_2015_moja.tiff' \
      -F disturbances='@disturbances/disturbances_2016_moja.tiff' \
      -F disturbances='@disturbances/disturbances_2018_moja.tiff' \
      -F classifiers='@classifiers/Classifier1_moja.tiff' \
      -F classifiers='@classifiers/Classifier2_moja.tiff' \
      -F db='@db/gcbm_input.db' \
      -F miscellaneous='@miscellaneous/initial_age_moja.tiff' \
      -F miscellaneous='@miscellaneous/mean_annual_temperature_moja.tiff' \
      -F title="run4" \
      http://localhost:8080/gcbm/upload

# after starting the simulation, watch the container console of first terminal            
curl -d "title=run4" -X POST http://localhost:8080/gcbm/dynamic

@Namyalg
Copy link
Member

Namyalg commented Aug 7, 2022

I still get the error, I have typecasted all the occurrences to float
I'm curious to know more about type-safety, shouldn't the float version of it throw the error, because it has a greater size than int ?

@aornugent
Copy link
Contributor

Hmm, ok. I'll keep looking. Can you please share a gist (or a patch) showing how you added strong typing to app.py?

Regarding the error message - it's a bit cryptic, I'm not sure if it means the value is large in memory or numerically. It's a mystery!

@Namyalg
Copy link
Member

Namyalg commented Aug 7, 2022

https://github.com/moja-global/FLINT.Cloud/pull/166/files
typecasting added to app.py

Regarding the error message - it's a bit cryptic, I'm not sure if it means the value is large in memory or numerically. It's a mystery!

Alrighty 👀

@padmajabhol
Copy link
Member Author

I still get the error, I have typecasted all the occurrences to float I'm curious to know more about type-safety, shouldn't the float version of it throw the error, because it has a greater size than int ?

Same, I still get an error.
Have we tried to compare the docker sizes before and after the changes?

@aornugent
Copy link
Contributor

I did a detailed diff between @Namyalg's config and the reference simulation and found a small discrepancy.

In @Namyalg's API generated copy of intial_age.json the nodata value is 3.8e+38, but the input layer is type Int16. If I change the value to: 32767 then I think the simulation runs.

@padmajabhol can you please test if this fixes the error for you?

If so, that means our type safety needs to be more precise than just casting everything as Float32 - it depends on the format of the input file. I can see Byte, Float32 and Int16 as the three main types. We might be able to get the input type from: https://rasterio.readthedocs.io/en/latest/api/rasterio.dtypes.html

@padmajabhol
Copy link
Member Author

I did a detailed diff between @Namyalg's config and the reference simulation and found a small discrepancy.

In @Namyalg's API generated copy of intial_age.json the nodata value is 3.8e+38, but the input layer is type Int16. If I change the value to: 32767 then I think the simulation runs.

@padmajabhol can you please test if this fixes the error for you?

If so, that means our type safety needs to be more precise than just casting everything as Float32 - it depends on the format of the input file. I can see Byte, Float32 and Int16 as the three main types. We might be able to get the input type from: https://rasterio.readthedocs.io/en/latest/api/rasterio.dtypes.html

Makes sense. Sure.

@aornugent
Copy link
Contributor

@padmajabhol @Namyalg - were you able to test this fix?

@Namyalg
Copy link
Member

Namyalg commented Aug 13, 2022

It doesn't work, we tried setting the values to 32767 in mean_temperature and inital_age, but still throws the same error

@padmajabhol
Copy link
Member Author

@padmajabhol @Namyalg - were you able to test this fix?

Hi, @Namyalg and me went on a meet to discuss this.

From what we understood from your previous comment is that, you wanted the float32 values to change to init16 so the value 3.4e+38 can change to 32767. For now, the mean_annual_temperature json looks like this: https://docs.moja.global/en/master/GCBM/GCBMInputs/InvestigatingGCBMInputs/miscellaneous/mean_annual_temperature.html

We changed it to init16 but it failed for both of us. Did it work for you, if yes, can you share the script?

We might have not processed it the way you think so please correct us here.

@aornugent
Copy link
Contributor

aornugent commented Aug 13, 2022

Thanks @padmajabhol and @Namyalg - that's not quite what I meant.

The input tiff files are in a variety of formats. Byte, Float32, and Int16. Each of these types has a different nodata value.

The bug I described above is in the API generated configuration for initial_age.tiff. If I start a new simulation, run the command described in curl.md and inspect the file initial_age.json I see it has a value of 3.4e+38 and that initial_age.tiff is type Int16. This value is too large for the input file type - hence the error.

What I'd like you to do is inspect the initial_age.json file and confirm what the nodata value is. If it has a value 3.4e+38 then change it to 32767 and see if the simulation runs. The problem appears to be confined to this one file. You do not need to change mean_annual_temperature.json.

@padmajabhol
Copy link
Member Author

This was the JSON file that I for for initial_age_moja.tiff
{'layer_type': 'GridLayer', 'layer_data': 'Int16', 'nodata': 32767, 'tileLatSize': 1.0, 'tileLonSize': 1.0, 'blockLatSize': 0.1, 'blockLonSize': 0.1, 'cellLatSize': 0.00025, 'cellLonSize': 0.00025}

I'll check if the nodata value is different for the sample that we use for the simulation run.

@aornugent
Copy link
Contributor

aornugent commented Aug 13, 2022

Darn ok - I must have compared an incorrect configuration. @Namyalg maybe you had just applied the patch on #166?

@padmajabhol can you please zip and attach the configuration generated by the API so that I can also compare? Or DM me on Slack.

@Namyalg
Copy link
Member

Namyalg commented Aug 13, 2022

The latest update to this file https://github.com/Namyalg/FLINT.Cloud/blob/type-error/local/rest_api_gcbm/app.py works, and it does not throw the Values Too large error

@Namyalg
Copy link
Member

Namyalg commented Aug 13, 2022

I have assigned the values of nodata in mean_annual_temperature and intial_age as 32767.0, and type as Int16
@padmajabhol , can you check with this version of app.py

@padmajabhol
Copy link
Member Author

I have assigned the values of nodata in mean_annual_temperature and intial_age as 32767.0, and type as Int16 @padmajabhol , can you check with this version of app.py

Works for me on cloud shell.

@padmajabhol
Copy link
Member Author

Darn ok - I must have compared an incorrect configuration. @Namyalg maybe you had just applied the patch on #166?

@padmajabhol can you please zip and attach the configuration generated by the API so that I can also compare? Or DM me on Slack.

https://github.com/padmajabhol/LFX-mentorship-progress-tracker/blob/main/Untitled.ipynb
These are the nodata value for intial_moja and mean_annual_temperature.

@Freeman-kuch
Copy link

Hi, Can I work on this ?

@Freeman-kuch
Copy link

@Namyalg I have Raised a PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment