New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use serial ZOLTAN load balancer by default. #5214
base: master
Are you sure you want to change the base?
Conversation
We have experienced rather poor partitioning when using the parallel version of ZOLTAN. We are not sure what the cause for this is and wheher we can fix this by using different defaults or different underlying partioners. For the time being we simply change the default. This will cause some time increase when loadbalancing the grid but the rest of the simulator might actually be faster and compensate for this.
jenkins build this please |
This is a welcome change until we figure out how to make good partitions in parallel. |
the error for SPE1CASE1 looks a bit suspicious
I need to find out what this means. Topology should definitely not change here. @akva2 How can I see the full output of ACTIONX_M1, the interesting part is cut because of all the time steps. |
that's the damaris test so topology arrays changes since the partitioning changes. I'll provide an update for the files once this is good to go. Alternatively you can switch the test to use parallel partitioning. i don't know where the test output threshold is configured so I've sent you the output on slack. |
Thanks the change in ACTION_M1 is a bit concerning, PBUB for restart is different for one cell:
I might need to run this on my system so, what exactly changes. Maybe the actions are triggered at different times now. |
I have no real clue. The reported difference is for restart of report step 15 it seems. That is a bit strange because: |
@bska Do you have a hint where the difference for the restart output of the last report step might come from? |
I'm afraid I don't. This test has been unstable/sensitive for a long time. I reduced its maximum timestep size in commit a2fa381 (PR #4749), but I guess that just hid the problem instead of actually solving it. Add to that that we way we compute |
I'm rerunning the build check here, mostly to recreate the detailed failure descriptions on the CI system. |
jenkins build this please |
We have experienced rather poor partitioning when using the parallel version of ZOLTAN. We are not sure what the cause for this is and wheher we can fix this by using different defaults or different underlying partioners.
For the time being we simply change the default. This will cause some time increase when loadbalancing the grid but the rest of the simulator might actually be faster and compensate for this.