Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dimr_config for parallel run is incorrect #562

Open
veenstrajelmer opened this issue Sep 15, 2023 · 10 comments · May be fixed by #623
Open

dimr_config for parallel run is incorrect #562

veenstrajelmer opened this issue Sep 15, 2023 · 10 comments · May be fixed by #623
Assignees
Labels
priority: high type: bug Something isn't working

Comments

@veenstrajelmer
Copy link
Collaborator

veenstrajelmer commented Sep 15, 2023

Describe the bug
When generating a dimr_config.xml with hydrolib for an MPI run, we need to pass the process and mpiCommunicator arguments. The expected dtype for process is int, but this should not be directly passed but converted to a space-separated range. For 4 cores, the entry in the dimr should be <process>0 1 2 3</process>, but instead it reads <process>4</process>.

To Reproduce
The file resulting from the code below contains an invalid <process> entry, which makes the model run crash:

from hydrolib.core.dimr.models import DIMR, FMComponent
fm_comp = FMComponent(name="test", workingDir='.', inputfile='test.mdu',
                      process=4, mpiCommunicator="DFM_COMM_DFMWORLD")
dimr = DIMR(component=fm_comp)
dimr.save(filepath='dimr_config.xml')

Expected behavior
<process>0 1 2 3</process> in the dimr_config.xml. It might be necessary to align processor/core numbers between different components of the DIMR file, but that is currently unsure. This must also be fixed for reading the xml file

Version info (please complete the following information):

  • OS: Windows
  • Version 0.5.2

Additional request
Since the mpiCommunicator is always the same ("DFM_COMM_DFMWORLD"), we could also consider to always write it to the dimr_config if the amount of processes is >1. One argument less, always nice.

@tim-vd-aardweg
Copy link
Contributor

So, just to be sure. If the processes = 5, we should write: <process>0 1 2 3 4</process>?

Should we still support the files that we have written incorrectly? In other words, should we still be able to parse: <process>4</process>?

@veenstrajelmer & @rhutten is this correct?

@MRVermeulenDeltares
Copy link
Contributor

MRVermeulenDeltares commented Apr 17, 2024

Currently I have assumed the following:

This change is only for FM and not RR.

When 0 is input for process the dimr_config.xml will not have added. --> correct

When 1 is input for process the dimr_config.xml will have 0 added. --> @rhutten will verify this with the Rekenhart team.

open questions:
What should happen if an user inputs something different than a int?
e.g.

    component = FMComponent(
        name="test",
        workingDir=".",
        inputfile="test.mdu",
        process="Four",
        mpiCommunicator="DFM_COMM_DFMWORLD",
    )

or

    component = FMComponent(
        name="test",
        workingDir=".",
        inputfile="test.mdu",
        process=4.0,
        mpiCommunicator="DFM_COMM_DFMWORLD",
    )

Do we want to throw an exception or do we want to (silently) leave the process out? --> an exception should be thrown.

open question for a followup issue:
Currently the abstract class component has process implemented.
For the FM component the changes have been applied in the specified branch.
For the RR component process can be given, but is not expected.
This will still result in an output in the .xml file, should we handle this is some way? ---> see #624

    component = RRComponent(
        name="test",
        workingDir=".",
        inputfile="test.mdu",
        process=4,
        mpiCommunicator="DFM_COMM_DFMWORLD",
    )

e5poFbAFEW

@rhutten
Copy link
Collaborator

rhutten commented Apr 22, 2024

If process = 0 or 1 is selected, no process keyword should be written to dimr_config.xml

@MRVermeulenDeltares MRVermeulenDeltares moved this from Review in progress to In progress in HYDROLIB-core Apr 22, 2024
@MRVermeulenDeltares MRVermeulenDeltares moved this from In progress to Review/test follow up in HYDROLIB-core Apr 22, 2024
@MRVermeulenDeltares
Copy link
Contributor

Discussed with @rhutten:
Should we still support the files that we have written incorrectly? In other words, should we still be able to parse: <process>4</process>? --> No, but a warning should be given.

When 0 or 1 is input for process the dimr_config.xml will not have added --> No process will be added, a message will be given to inform te user "Note that for process 0 or 1 no sequential error will be given."

@rhutten
Copy link
Collaborator

rhutten commented Apr 23, 2024

Correction on the last message, for process only 1 can be added. Please give the warning specified above. For 0-process is not possible and an error should be given. Error message: the keyword process can not be 0, please specify values 1 or greater.

@MRVermeulenDeltares
Copy link
Contributor

Discussed with @tim & @jelmer, conclusion:

acceptance criteria:

  • component.process is an int.

  • when import the dimr_config.xml a string list is expected e.g. "0 1 2 3".
    this should be converted/parsed to an int on the model.component.process, for the example 4.
    when parsing from int to the string list, it will take the int and make the list up to (not including) the model.component.process number starting from 0.

  • when export the dimr_config.xml a string list should be written, e.g. component has 4.
    the output file will have "0 1 2 3"
    when parsing from string list to the int, it will take the the amount of ints in the list and set it to model.component.process.

  • keep it simple

  • input from file is validated on import list e.g. "0 1 2 3", (validation on converter/parser) --> input not parsable to list of ints, throw value error.

  • input from user of none or 1 can leave the process out, no message needed, since this is what the user intends (implemented on converter/parser)

  • input from user of > 2 will write the string list to the file, no message needed. (implemented on converter/parser)

  • input from user is validated on model.component e.g. int (fmcomponent, validator on field process)

  • input from user of 0 or negative, throws an error. (fmcomponent, validator on field process)

@veenstrajelmer
Copy link
Collaborator Author

One more addition, according to the manual this is also supported:
<process>0:4</process>
Might be good to also support for this notation in hydrolib-core, maybe as part of a separate issue if not convenient now.

@MRVermeulenDeltares
Copy link
Contributor

MRVermeulenDeltares commented Apr 29, 2024

@veenstrajelmer, currently added the support for 0:4, the assumption is made that this is the same as <process>0 1 2 3 4</process>.
The last value is used to determine the process value.
when writing it will thus be written as <process>0 1 2 3 4</process>.
This means, <process>2:4</process> will be assumed as process = 5, and will be written as <process>0 1 2 3 4</process>.

Further support for this can be done in a seperate issue.
Acceptance criteria for followup-issue is to be refined.
I will need to discuss the current additional support with @rhutten.

@MRVermeulenDeltares MRVermeulenDeltares moved this from In progress to Ready to review in HYDROLIB-core Apr 29, 2024
@veenstrajelmer
Copy link
Collaborator Author

@MRVermeulenDeltares: <process>2:4</process> would (I think) translate to <process>2 3 4</process> and therfore as FMComponent(process=3), when writing it again, it will result in <process>0 1 2</process> I guess.

@MRVermeulenDeltares
Copy link
Contributor

MRVermeulenDeltares commented Apr 29, 2024

Discussed with @rhutten, I will create a follow-up issue.
We can keep the above described as is and determine the acceptance criteria for the full support in the follow-up issue.
since fmcomponent expects an int for process, a start and end are currently not supported, thus reading/writing of a semicolon input can only be partialy supported.

In the follow-up issue we will need to discuss what how the user should be able to apply start and end value for process.
This way we can also determine how to add support to read and write the semicolon value for process.
The acceptance criteria for the follow-up can be further refined with input from "D-Flow_FM-User_manual 17.1.3 Input DIMR"
Maybe we can also request more information on what the numbers actualy mean, e.g. 16:31, would this mean run on core 16 to 31?

@veenstrajelmer, If in your opinion the behaviour you have described is more desirable, I can have a look to implement that in this issue.
Any other update to support semicolon input and output should be handled in the follow-up issue.

Update: suggestion implemented

@tim-vd-aardweg tim-vd-aardweg moved this from Ready to review to Review in progress in HYDROLIB-core May 2, 2024
@tim-vd-aardweg tim-vd-aardweg self-assigned this May 2, 2024
@MRVermeulenDeltares MRVermeulenDeltares moved this from Review in progress to In progress in HYDROLIB-core May 3, 2024
@MRVermeulenDeltares MRVermeulenDeltares moved this from In progress to Ready to review in HYDROLIB-core May 3, 2024
@MRVermeulenDeltares MRVermeulenDeltares moved this from Ready to review to Test in HYDROLIB-core May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: high type: bug Something isn't working
Projects
HYDROLIB-core
  
Test
Development

Successfully merging a pull request may close this issue.

5 participants