Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TG2-VALIDATION_TYPESTATUS_STANDARD #285

Open
ArthurChapman opened this issue Feb 9, 2024 · 10 comments
Open

TG2-VALIDATION_TYPESTATUS_STANDARD #285

ArthurChapman opened this issue Feb 9, 2024 · 10 comments
Labels
Conformance CORE TG2 CORE tests OTHER Parameterized Test requires a parameter Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 Validation VOCABULARY

Comments

@ArthurChapman
Copy link
Collaborator

ArthurChapman commented Feb 9, 2024

TestField Value
GUID 4833a522-12eb-4fe0-b4cf-7f7a337a6048
Label VALIDATION_TYPESTATUS_STANDARD
Description Does the value of dwc:typeStatus occur in bdq:sourceAuthority?
TestType Validation
Darwin Core Class Occurrence
Information Elements ActedUpon dwc:typeStatus
Information Elements Consulted
Expected Response EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:typeStatus is EMPTY; COMPLIANT if the value of dwc:typeStatus is in the bdq:sourceAuthority; otherwise NOT_COMPLIANT.
Data Quality Dimension Conformance
Term-Actions TYPESTATUS_STANDARD
Parameter(s) bdq:sourceAuthority
Source Authority bdq:sourceAuthority default = "Darwin Core typeStatus" {[https://dwc.tdwg.org/list/#dwc_typeStatus]} {dwc:typeStatus vocabulary API [(https://gbif.github.io/parsers/apidocs/org/gbif/api/vocabulary/TypeStatus.html]}
Specification Last Updated 2024-02-09
Examples [dwc:typeStatus="holotype": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:typeStatus has an equivalent in the bdq:sourceAuthority"]
[dwc:typeStatus="cleptotype": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:typeStatus does not have an equivalent in the bdq:sourceAuthority"]
Source ALA, GBIF
References
Example Implementations (Mechanisms)
Link to Specification Source Code
Notes This bdq:Supplementary test is not regarded as CORE (cf. bdq:CORE) because of one or more of the reasons: not being widely applicable; not informative; not straightforward to implement or likely to return a high percentage of either bdq:COMPLIANT or bdq:NOT_COMPLIANT results (cf bdq:Response.result). A Supplementary test may be implemented as CORE when a suitable use case exists. This test must return NOT_COMPLIANT if there is leading or trailing whitespace or there are leading or trailing non-printing characters.
@ArthurChapman ArthurChapman added TG2 Validation OTHER Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT VOCABULARY NEEDS WORK Supplementary Tests supplementary to the core test suite. These are tests that the team regarded as not CORE. Conformance Parameterized Test requires a parameter labels Feb 9, 2024
@ArthurChapman
Copy link
Collaborator Author

@chicoreus @tucotuco I'm not sure that the link I have for the API is an actual API or if one exists (https://gbif.github.io/parsers/apidocs/org/gbif/api/vocabulary/TypeStatus.htm) thus the NEEDS WORK label

@chicoreus
Copy link
Collaborator

This is a case where "INTERNAL_PREREQUISITES_NOT_MET if dwc:typeStatus is EMPTY" is very likely a non-helpful part of the response. Data has quality for use if the response result is COMPLIANT. Any other value implies that the data is not fit for purpose. Anywhere where data are sparse, as with type status on occurence data, the absence of a value does not indicate a data quality problem. This test and others like it where data values are very likely to be correctly sparse should return COMPLIANT when the information element acted upon is empty.

@chicoreus
Copy link
Collaborator

In general, we need to review the use of INTERNAL_PREREQUISITES_NOT_MET for empty values across all of the test definitions to date. When data should have a value most of the time (even when this aspirational and much data in the wild doesn't have a value, as in some of the georeference metadata (even then we need to make sure that there is a georeference before asserting that the absence of georeference metadata is a problem)), then we should be using INTERNAL_PREREQUISITES_NOT_MET for empty values of the information element acted upon. But, when data are expected to be correctly sparse, as here, then COMPLIANT is a much more appropriate response result for empty values.

@Tasilee
Copy link
Collaborator

Tasilee commented Feb 18, 2024

@chicoreus - To me, an EMPTY value triggering INTERNAL_PREREQUISITES_NOT_MET states that the test was not run (further). We have no RUN_HAS_RESULT. This makes no judgement about 'quality' or lack of it. The reason for INTERNAL_PREREQUISITES_NOT_MET was I thought the statement "We are unable to comment on the Information Element Acted Upon". This seems appropriate.

The question remains is this test Supplementary or CORE (aspirational)?

@Tasilee
Copy link
Collaborator

Tasilee commented Feb 18, 2024

The issue seems to be the triplicate

...NOTEMPTY
...STANDARD
...STANDARDIZED

To me, the anomaly is the NOTEMPTY tests that have RUN_HAS_RESULT=NOT_COMPLIANT. The point @chicoreus raised about an EMPTY value potentially not detracting from 'quality' can apply. Not so for the STANDARD and STANDARDIZED tests.

The point being those NOTEMPTY CORE tests would (I think) be aspirational in that a NOT_COMPLIANT response is something we feel needs to be flagged, and values encouraged.

With tests like #289, they would indeed need to be Supplementary given the lack of records with values.

@ArthurChapman
Copy link
Collaborator Author

The number of specimens with a value in dwc:typeStatus would be very low, and no observations. Thus the test for NOTEMPTY (#246) would definitely be Supplementary as "likely to return a high percentage of either .... bdq:NOT_COMPLIANT results". If we followed a workflow that would run: ...NOTEMPTY test followed by running those where the result was COMPLIANT through the test for STANDARD ... etc. - it would be a different matter.

BUT - we don't follow a workflow. That is, each test is standalone, running this test has great value. 99.9% would have a result of INTERNAL_PREREQUISITES_NOT_MET - OK - not a problem because this is not unexpected as most specimens and all Observations rightly have no Type Status. Knowing, however that the other 0.1% that have something in dwc:typeStatus follow the Standard or not, is important and says a lot about the data quality.

For that reason, I am tempted to suggest that this test could be CORE, but should include a note that it is expected that most results would return a INTERNAL_PREREQUISITES_NOT_MET result.

@Tasilee Tasilee added CORE TG2 CORE tests and removed NEEDS WORK Supplementary Tests supplementary to the core test suite. These are tests that the team regarded as not CORE. labels Feb 20, 2024
@chicoreus
Copy link
Collaborator

chicoreus commented Feb 21, 2024

@Tasilee In the Framework, under quality control, NOT_COMPLIANT values point out aspects of the data that need improvement for the data to fit the needs of the UseCase, so INTERNAL_PREREQUSITES_NOT_MET can be ignored (or made the responsibility of another test). The thing we need to think through is QualityAssurance, where the data are filtered down so that all records are COMPLIANT on all Validations, validations that return INTERNAL_PREREQUSITES_NOT_MET mean that the data lack quality for the use. When data are expected to be densely populated, and a VALIDATION_X_STANDARD is coupled with a VALIDATION_X_NOTEMPTY, this isn't important, as empty values will come up as NOT_COMPLIANT on the paired VALIDATION_X_NOTEMPTY test. But when data are expected to be sparsely populated, and the UseCase isn't paring a VALIDATION_X_STANDARD with a VALIDATION_X_NOTEMPTY, and the VALIDATION_X_STANDARD stands alone, then any data which correctly has no value will be excluded under the filtering for COMPLIANT only records under QualityAssurance. So, for sparse data without a paired NOTEMPTY test, we need to allow the VALIDATION_X_STANDARD to treat empty values as compliant, or assess from other terms whether a value should be present (as in some of the metadata tests).

Currently tagged as core, but has the " is not regarded as CORE" text in the note.

@ArthurChapman
Copy link
Collaborator Author

See comment at #284 (comment) where a Vocabulary is being developed by GBIF. Perhaps, pending that, this should be Immature/Incomplete?

@chicoreus
Copy link
Collaborator

chicoreus commented Feb 22, 2024 via email

@Tasilee
Copy link
Collaborator

Tasilee commented Feb 29, 2024

GBIF has dwc:typeStatus as not EMPTY for ~0.7% of records and ALA 0.5%. If this is truly aspirational, fine, CORE, otherwise Supplementary on the basis of proportion of likely INTERNAL_PREREQUISITES_NOT_MET.

Decision please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Conformance CORE TG2 CORE tests OTHER Parameterized Test requires a parameter Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 Validation VOCABULARY
Projects
None yet
Development

No branches or pull requests

3 participants