Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

States division #24

Closed
gnopik opened this issue Feb 16, 2024 · 6 comments
Closed

States division #24

gnopik opened this issue Feb 16, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@gnopik
Copy link
Contributor

gnopik commented Feb 16, 2024

The default procedure does not return states with an equal amount of observations. The screenshot (tested in the dashboard) and the data are attached.
image
case1_data.csv

@tupui tupui added the bug Something isn't working label Feb 16, 2024
@tupui
Copy link
Member

tupui commented Mar 12, 2024

I actually know what is happening: NaN...

If I load the dataset, then do the decomposition and on the bins fill NaN, then I get an equal count for all scenarios.

I need to dig more to understand why we have NaNs. I don't remember the details there.

I have the feeling binned_statistic_dd is not doing exactly what I think it is🤔 I know for a SciPy maintainer... 😅

Maybe I need to calculate the bins for each axis before instead. This way I am sure that the binning is done on the number of sample and not the values. Need to check that hypothesis 😮‍💨

@gnopik
Copy link
Contributor Author

gnopik commented Mar 13, 2024

NaNs in bins - what do you mean, like this?
image
This is the way to communicate that we want particular boundaries between states (==bins), and this case, just for the second & third input variables out of four.
If the whole thing is not supplied, (at least in the matlab package), the state boundaries are defined automatically:

  • either by categories if 5 or less unique values, or
  • equal amount of observations (highlighted)
    image

@tupui
Copy link
Member

tupui commented Mar 13, 2024

Yep we can provide bounds for the bins. I just thought that was the normal behavior. I have to check that in SciPy's code and do some poking around.

So worst case I can do as you do and construct my own bounds it's not hard 👍

@tupui
Copy link
Member

tupui commented Mar 13, 2024

For the NaNs I don't remember why we have them, need to check as well.

@tupui
Copy link
Member

tupui commented May 18, 2024

Should be fixed in a81bf18

@tupui tupui closed this as completed May 18, 2024
@gnopik
Copy link
Contributor Author

gnopik commented May 20, 2024

For the NaNs I don't remember why we have them, need to check as well.

Easier to discuss over a call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants