Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow bypassing validation to read faulty input #207

Open
arthurvd opened this issue Jan 29, 2022 · 3 comments · May be fixed by #209
Open

Allow bypassing validation to read faulty input #207

arthurvd opened this issue Jan 29, 2022 · 3 comments · May be fixed by #209
Assignees

Comments

@arthurvd
Copy link
Member

Introduction:
This issue originates from multiple questions (a.o. #199), where sometimes it is desirable to bypass or switch off validation when reading model input files. Two different example situations:

  • Input is valid for D-HYDRO, but HYDROLIB-core is validating too strictly. Bypassing validation is then a temporary workaround, pending the HYDROLIB-core bugfix.
  • Input is not valid for D-HYDRO, but if this faulty input could still be read using HYDROLIB-core (optionally with validation switched off), then missing/invalid fields could be filled/repaired in a user script, and then serialized as a valid file.

Question:
How to bypass validation when reading a faulty D-HYDRO input file using HYDROLIB-core?

@evetion
Copy link
Member

evetion commented Jan 31, 2022

While it's good to explore what's possible in Pydantic in #209, I fear this issue needs a written out plan first.

Hydrolib-core is designed to match valid model input. That's already a range of possible models, links, defaults, versions (#180).

I understand the feature request of being able to fix a broken input, but how broken? A single NaN? Or links between models that are not valid? Or too small a model to be valid? How do you migrate from these broken states to a valid one? Does that introduce a(nother) state? What's the API for that?

Worse, most of our logic is now based on the guarantees that validation gives us. We assume something is a number, list or specific instance. If it's not, things will inevitably break.

Also, it might be good to find out how existing models (as in files on disk) as proposed in #199, could be missing something like friction values.

@arthurvd
Copy link
Member Author

arthurvd commented Feb 1, 2022

Indeed, before just making this, let's take one step back: what does a user expect to do with model input that contains invalid data?

Is it convenient for users to repair invalid input directly in object state? (rather than in the input files)

Missing or wrong optional numeric/character string values in a particular field:
Example 1: an invalid friction value in a friction file for a particular branch: hydrolib.core.io.friction.models.FrictBranch.frictionvalues.
This might be not too hard for the user to fix in the invalid state object (although, one has to know precisely how the class tree is structured, in this case: fmmodel.geometry.frictfile[fileindex].branch[branchindex].frictionvalues[positionindex]).
Even here it might be easier for the user to repair this in the input file directly.

Missing or wrong filenames:
Example 2: a filename is referenced that does not exist, for example a missing networkfile in fmmodel.geometry.netfile.
Here it is not simply fixing the character string for the filename, but in the object tree, also then the underling NetworkModel must be constructed (re-read from the correct file).
Also here, maybe fixing the input MDU file, and then re-reading is easier.

How to fix invalid input files, still taking as much benefit as possible from Python-based validation?

The validation error can be caught, and the ValidationError object inspected on its detailed contents, for example using:

from pydantic import ValidationError
from hydrolib.core.io.mdu.models import FMModel

filepath = test_data_dir / "input/e02/c11_korte-woerden-1d/dimr_model/dflowfm/FlowFM.mdu
try:
    fm_model = FMModel(filepath)
except ValidationError as error:
    errorlist = error.errors()
    print(f"Number of validation errors: {len(errorlist)}.")

    filelist = set(
        map(
            lambda e: e["loc"][4]
            if isinstance((loc3 := e["loc"][3]), int)
            else loc3,
            errors,
        )
    )
    print(f"Files with errors: {', '.join(filelist)}.")
    print(f"Error details:\n{error}")

which could produce:

Number of validation errors: 2.
Files with errors: roughness-Main.ini, nodeFile.ini.
Error details:
2 validation errors for FMModel
FlowFM.mdu -> geometry -> frictFile -> 0 -> roughness-Main.ini -> global -> 0 -> Main -> frictionType
  value is not a valid enumeration member; permitted: 'Chezy', 'Manning', 'wallLawNikuradse', 'WhiteColebrook', 'StricklerNikuradse', 'Strickler', 'deBosBijkerk' (type=type_error.enum; enum_values=[<FrictionType.chezy: 'Chezy'>, <FrictionType.manning: 'Manning'>, <FrictionType.walllawnikuradse: 'wallLawNikuradse'>, <FrictionType.whitecolebrook: 'WhiteColebrook'>, <FrictionType.stricklernikuradse: 'StricklerNikuradse'>, <FrictionType.strickler: 'Strickler'>, <FrictionType.debosbijkerk: 'deBosBijkerk'>])
FlowFM.mdu -> geometry -> storageNodeFile -> nodeFile.ini -> storagenode -> 0 -> 10634 -> storageType
  value is not a valid enumeration member; permitted: 'reservoir', 'closed' (type=type_error.enum; enum_values=[<StorageType.reservoir: 'reservoir'>, <StorageType.closed: 'closed'>])

@myrthearcadis, @ABuijert : can you please check whether the above example code helps you sufficiently?

@ABuijert
Copy link

ABuijert commented Feb 1, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants