You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In order to support validation of data ranges the RequiredDataValidator needs to be extended and renamed.
There are three use cases that need to be covered:
"Simple" requirement: Some data just required to be present no further constraints
Required and constrained: Data needs to be present and within a certain range
Constrained but not required: Data does not need to be present but if it is, there are constraints on it
This first use case is already currently covered and will not require any changes.
In order to support the second and third use case, the yaml file that specifies required data and the RequiredDataValidator class will be extended.
For the second use case we introduce the notion of constraints "constraints" as a keyword that can be used like so:
We require a measurand called Emission|CO2 reported in the unit Mt CO2/yr
We require this measurand to be present for the years 2020 to 2050 in 5 year time steps
We require this measurand to be present for the regions World and Europe
Only for the value for the region World in the year 2020 we place a constraint, to be between 42 and 46 Gt CO2.
For the third use case we introduce a new section in the file besides required_data, namely optional_data.
The use for this identical to required_data with the only exception that the validation does not fail if the data is not present.
Using the above example again but this time with optional_data translates to the following logic:
If Emission|CO2 is completely missing from the data the validation passes as we're looking at optional_data.
If it is there, all of the above logic applies. So it need to be reported in a specific unit, for a number of years and regions and the value for 2020, for the World region need to be within a range.
I think that makes a lot of sense. Having optional data without constraints wouldn't have any effect, but it's probably better to keep the same structure between the required and optional rather than being perfectly "efficient".
Two other ideas/suggestions:
Rename the class to DataValidator (because it also has not-required components)
Rename the directory where such data-validation files are stored (and are tested as part of the nomenclature-validation) to data_validation
[In parallel, the directory with MetaValidator yaml files could be renamed to meta_validation?]
In order to support validation of data ranges the
RequiredDataValidator
needs to be extended and renamed.There are three use cases that need to be covered:
This first use case is already currently covered and will not require any changes.
In order to support the second and third use case, the yaml file that specifies required data and the
RequiredDataValidator
class will be extended.For the second use case we introduce the notion of constraints
"constraints"
as a keyword that can be used like so:This would be interpreted as follows:
Emission|CO2
reported in the unitMt CO2/yr
World
andEurope
World
in the year 2020 we place a constraint, to be between 42 and 46 Gt CO2.For the third use case we introduce a new section in the file besides
required_data
, namelyoptional_data
.The use for this identical to
required_data
with the only exception that the validation does not fail if the data is not present.Using the above example again but this time with
optional_data
translates to the following logic:Emission|CO2
is completely missing from the data the validation passes as we're looking atoptional_data
.World
region need to be within a range.FYI @danielhuppmann
The text was updated successfully, but these errors were encountered: