Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define validation results JSON format to be used for MVP #49

Open
jcadam14 opened this issue Jan 29, 2024 · 4 comments
Open

Define validation results JSON format to be used for MVP #49

jcadam14 opened this issue Jan 29, 2024 · 4 comments

Comments

@jcadam14
Copy link
Contributor

Need to determine JSON format for the validation results. This may just be the format coming from the data-validator, but need to go over that format with the group and decide if it's still relevant or needs to be adjusted. This most likely will end up being a story in the data-validator repo, or could be massaging that data here after retrieving it from the validator.

@jcadam14
Copy link
Contributor Author

jcadam14 commented Mar 19, 2024

multifield_validation_error.json
validation_failures.json

@hkeeler
Copy link
Member

hkeeler commented Mar 20, 2024

A few things we might want to consider...

  1. Do we want to include links to the FIG in the JSON message. There's a couple places where that could make sense.
    1. validation - Currently the anchors to the validation ids don't match anything in the validator (Anchor tags for data validation checks sbl-content#12). We've said we now plan to fix that in the CMS, so once that's done, the frontend could build those URLs, but it seems more convenient to just give the URLs to the frontend.
    2. fields - This feels like bonus points. If we link to the validations in the FIG, those then direct you to the fields...though they're not links there either. 🤔 If we did decide to do this, though, note that the anchors there dash-cased, not snake_cased like the actual fields, so we'd have to do a little conversion.
  2. We should add a wrapping element around the top-level array. That'll let us add additional metadata about the validation results, such as...
    1. Stats like number of errors, warnings, etc.
    2. Paginiation info
    3. A link to the csv download
  3. record_no is zero-indexed. Do we want to use one-based indexing instead? Seems less confusing to end users. Of course, that could still get them off-by-one since a CSV has a header row. I'm hesitant to put line_no in though since the validator needs to support other formats besides CSV int the future (JSON), and line number has no meaning in that case.
    • Also, HMDA uses a "ULI" over line number.
  4. description does not have the same rich formatting as the FIG. For instance, we drop the bullet lists, and it's more like multiple sentences. We could do more there, but I think that'd largely depend on how we want to show that info on the frontend...if at all?
  5. Do we need human-readable field names vs. the snake_cased column names?

@jcadam14
Copy link
Contributor Author

  1. We should add a wrapping element around the top-level array. That'll let us add additional metadata about the validation results, such as...

    1. Stats like number of errors, warnings, etc.
    2. Paginiation info
    3. A link to the csv download

For the csv download, we were thinking an endpoint like /submissions/latest/result_download or /submissions/{id}/results_download. So including a link in metadata would be odd, in my brain, since that should be static. Unless we want them to be able to download specific chunks, like a paginated csv but I'd question the usefulness of that. The pagination info we could add to the results if we're going with a paginated /submissions/latest/results and/or /submissions/{id}/results paginated endpoint which I think for MVP is desired right?

@jcadam14
Copy link
Contributor Author

jcadam14 commented Mar 20, 2024

  1. record_no is zero-indexed. Do we want to use one-based indexing instead? Seems less confusing to end users. Of course, that could still get them off-by-one since a CSV has a header row. I'm hesitant to put line_no in though since the validator needs to support other formats besides CSV int the future (JSON), and line number has no meaning in that case.

    • Also, HMDA uses a "ULI" over line number.

I like using the UID. We validate it's unique for each entry and probably something that makes sense to the FI submitting, and it's easier to search for in their original submitted data than scrolling to find a row number.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants