Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation schema design #7

Closed
lchen-2101 opened this issue Dec 4, 2023 · 5 comments
Closed

Validation schema design #7

lchen-2101 opened this issue Dec 4, 2023 · 5 comments
Assignees

Comments

@lchen-2101
Copy link
Collaborator

Persist results from data-validator to DB

Schema design things to consider:

  • context (associated with submission id)
  • groupings of validations (e.g. error 1 associated with failures for records 2, 5, 7, etc.)
@jcadam14 jcadam14 self-assigned this Dec 8, 2023
@jcadam14 jcadam14 linked a pull request Dec 12, 2023 that will close this issue
25 tasks
@jcadam14 jcadam14 changed the title Validation schema design (WIP) Validation schema design Dec 12, 2023
@hkeeler
Copy link
Member

hkeeler commented Dec 14, 2023

I started to look at PR #10, but I thought I would take a couple steps back and try to get my thoughts down on the overall schema for this database, not just the submission portion. Hopefully it can be a conversation starter, and we can do a few iterations on this, and it'll help us map out the remaining work and the order we want to take it on.

I also took a stab at what the filing and submissions state machines might look like.

Hope this is useful. Let's sync up on all this soon. Probably early next week.

Also, this is my first time playing with the inline Mermaid diagrams. It's not my favorite style-wise, but it's hard to beat having it just work right in a GitHub comment. ✨

---
title: Filing API Database Schema
---
erDiagram
    filing_type {
        varchar id PK
        varchar name UK
    }

    filing_period {
        varchar id PK
        varchar name UK
        date start_date
        date end_date
        date due_date
        varchar filing_type_id FK
    }
    filing_period }|--|| filing_type : ""

    filing_state {
        varchar id PK
        varchar name UK
    }

    filing {
        integer id PK "auto-increment"
        varchar lei "pseudo FK to `institution` table"
        varchar filing_period_id PK,FK
        varchar state FK
        varchar institution_snaphot_id "This needs some thought"
    }
    filing }|--|| filing_period : ""
    filing }|--|| filing_state : ""

    submission_state {
        varchar id PK
        varchar name UK        
    }

    submission {
        integer id PK 
        integer filing_id FK
        integer attempt "Q: Do we need this AND `time`?"
        timestamp time
        varchar user_id "Login.gov user id"
        varchar state FK
        varchar validation_ruleset_version "Useful if we update validation logic mid filing season"
        jsonb findings "JSONB alternative to additional `*_findings` tables"
    }
    submission }|--|| filing : ""
    submission }|--|| submission_state : ""

    submission_finding {
        integer submission_id FK
        varchar validation_code "Q: Do we need additional validation metadata?"
    }
    submission_finding }|--|| submission : ""

    submission_finding_record {
        integer record_number
        varchar field_name
        varchar field_value
    }
    submission_finding_record }|--|| submission_finding : ""


---
title: Filing State Machine
---
stateDiagram
    [*] --> filing_started
    filing_started --> institution_data_set
    institution_data_set --> submissions_in_progress
    submissions_in_progress --> filing_complete
    filing_complete --> submissions_in_progress
    filing_complete --> must_refile
    must_refile --> submissions_in_progress
    filing_complete --> [*]

---
title: Submission State Machine
---
stateDiagram
    classDef failure fill:#f00,color:white
    classDef error fill:#ff0


    [*] --> submission_started
    submission_started --> uploading
    uploading --> upload_failed:::failure
    uploading --> uploaded
    uploaded --> validating
    validating --> system_failure:::failure
    validating --> invalid_file:::error
    validating --> validated_with_errors:::error
    validating --> validated_with_warnings
    validating --> validated_clean
    validated_with_warnings --> signed
    validated_clean --> signed
    signed --> [*]

@jcadam14
Copy link
Contributor

Wow, mermaid is cool. I used to just throw stuff together in Visio but wasn't sure how somewhat high level designs were done here so figured I'd let the code be the design. Like the mermaid thing!

Read through comments and diagram. I'm putting my thoughts down here so I don't forget them later.

  • I normally don't like to store data like "count", "attempt", etc since those are easily derived from data.
  • Same with "state". If there's a single finding with severity "error" (which is a simple filter), the state of hard constraints, validated_with_errors, etc is known versus needing to be stored. But it looks like there's the potential for filings to be paused or what not during this process? (there's more to it than upload file, get back results?)
  • I stored the name of the field for an error at the same level as the error code because it's one to one. Records are a list of "rows" within a single submission that threw a particular error and they will all have the same field. So it made sense in my brain to move that up a level (unless I'm thinking about that wrong).
  • I didn't store validation code metadata other than what we might need to search over (error/warning code, severity). Things like description could be pulled from the phase_validation schema itself. It did make me wonder if we wanted to store all that in a static table to reference "just in case". Wasn't sure what all was needed.
  • The JSONB concept is nice, simplifies the schema a bit (I've worked with data that had deep parent/child relations and they can turn into problems to maintain and performance), but it makes it more procedural if we need to query on, say, "All submissions that resulted in error code E1234" or "All submissions that had just warnings". But if that sort of query isn't really needed, storing the results as JSON is a good way to go.

I'm scheduling a chat for Monday if that works.

@lchen-2101
Copy link
Collaborator Author

Maybe #6 should've been worked on first. That would open up the big picture of how the flows would look like; original thought of this ticket was only about storing the validation results. I think the filing state and things aside from the validation should be worked out there. I've linked the API flow from HMDA in that ticket: https://ffiec.cfpb.gov/documentation/api/filing/platform#endpoints, shares similarities with the state machine mentioned here aside from the confirm institution data part not being present on the HMDA end.

@jcadam14
Copy link
Contributor

---
title: Filing API Database Classes
---
classDiagram
    class `FilingDAO(AuditMixin, Base):`{
      __tablename__ = "filing"
      id: Mapped[int] = mapped_column#40;primary_key=True, autoincrement=True#41;
      lei: Mapped[str]
      state: Mapped[FilingState] = mapped_column#40;Enum#40;FilingState#41;#41;
      filing_period = Mapped[str] = mapped_column(ForeignKey#40;"filing_period.id"#41;#41;
      institution_snapshot_id = Mapped[str] # not sure what this is
   }

class `FilingPeriodDAO(AuditMixin, Base):`{
    __tablename__ = "filing_period"
    id: Mapped[int] = mapped_column#40;primary_key=True, autoincrement=True#41;
    name: Mapped[str]
    start: Mapped[datetime]
    end: Mapped[datetime]
    due: Mapped[datetime]
    filing_type: Mapped[FilingType] = mapped_column#40;Enum#40;FilingType#41;#41;
}

class `SubmissionDAO(AuditMixin, Base):`{
    __tablename__ = "submission"
    id: Mapped[str] = mapped_column#40;index=True, primary_key=True#41;
    submitter: Mapped[str] # user_id instead?
    results: Mapped[List["FindingDAO"]] = relationship#40;back_populates="submission"#41;
    state: Mapped[SubmissionState] = mapped_column#40;Enum#40;SubmissionState#41;#41;
    validation_ruleset_version: Mapped[str]
    json_dump: Mapped[dict[str, Any]] = mapped_column#40;JSON, nullable=True#41;
    filing: Mapped[str] = mapped_column#40;ForeignKey#40;"filing.id"#41;#41;
}

class `FindingDAO(AuditMixin, Base):`{
    __tablename__ = "submission_finding"
    id: Mapped[int] = mapped_column#40;primary_key=True, autoincrement=True#41;
    submission_id: Mapped[str] = mapped_column(ForeignKey#40;"submission.id"#41;#41;
    submission: Mapped["SubmissionDAO"] = relationship#40;back_populates="results"#41; # if we care about bidirectional
    validation_code: Mapped[str]
    severity: Mapped[Severity] = mapped_column#40;Enum#40;*get_args#40;Severity#41;#41;#41;
    records: Mapped[List["RecordDAO"]] = relationship#40;back_populates="result"#41;
}
note for `FindingDAO(AuditMixin, Base):` "Severity = Literal['error', 'warning']"

class `RecordDAO(AuditMixin, Base):`{
    __tablename__ = "submission_finding_record"
    id: Mapped[int] = mapped_column#40;primary_key=True, autoincrement=True#41;
    result_id: Mapped[str] = mapped_column#40;ForeignKey#40;"submission_finding.id"#41;#41;
    result: Mapped["FindingDAO"] = relationship#40;back_populates="records"#41;  # if we care about bidirectional
    record: Mapped[int]
    field_name: Mapped[str]
    field_value: Mapped[str]
}
`FilingPeriodDAO(AuditMixin, Base):` *-- `FilingDAO(AuditMixin, Base):`
`FilingDAO(AuditMixin, Base):` *-- `SubmissionDAO(AuditMixin, Base):`
`SubmissionDAO(AuditMixin, Base):` *-- `FindingDAO(AuditMixin, Base):`
`FindingDAO(AuditMixin, Base):` *-- `RecordDAO(AuditMixin, Base):`

---
title: State Enums
---
classDiagram
note for `SubmissionState(Enum):` "SUBMISSION_UPLOAD_FAILED seems to only be 
applicable if the storage to the S3 fails.  
Otherwise, if there is a failure to send the
 file to the initial upload endpoint, the endpoint 
wouldn't know about it.  Does not being able to 
store to S3 indicate an overall processing failure?"
class `SubmissionState(Enum):`{
    SUBMISSION_UPLOADED = 1
    SUBMISSION_UPLOAD_FAILED = 2
    SUBMISSION_WITH_ERRORS = 3  
    SUBMISSION_WITH_WARNINGS = 4
    SUBMISSION_SUCCESSFUL = 5
    SUBMISSION_SIGNED = 6
}
    
class `FilingState(Enum):`{
    FILING_STARTED = 1
    FILING_IN_PROGRESS = 2
    FILING_COMPLETE = 3
}
class `FilingType(Enum):`{
    TYPE_A = "Type_A"
    TYPE_B = "Type_B"
}

---
title: Filing State Machine
---
stateDiagram
    [*] --> filing_started
    filing_started --> filing_in_progress
    filing_in_progress --> filing_in_progress
    filing_in_progress --> filing_complete
    filing_complete --> [*]

---
title: Submission State Machine
---
stateDiagram
    classDef failure fill:#f00,color:white
    classDef warning fill:#ff0


    [*] --> submission_uploaded
    submission_uploaded --> submission_upload_failed:::failure
    submission_uploaded --> submission_with_errors:::failure
    submission_uploaded --> submission_with_warnings:::warning
    submission_uploaded --> submission_successful
    submission_with_warnings --> submission_signed
    submission_successful --> submission_signed
    submission_signed --> [*]

jcadam14 added a commit that referenced this issue Jan 8, 2024
…rom #7)

Merged in branch 7-validation-schema-design-wip to support latest DAOs
Updated alembic scripts for new tables
Updated tests to test alembic migration, and also updated the repo test since
I changed the FilingType enum to be MANUAL instead of TYPE_A and TYPE_B
jcadam14 added a commit that referenced this issue Jan 9, 2024
Added pytests to test the update_submission fuction.  Please
note the comment on the _generator_function in test_submission_repo.py
Needed to merge in parts from story #7
jcadam14 added a commit that referenced this issue Jan 17, 2024
Closes #13 

Created a new branch that focuses purely on the alembic script needs
instead of a full merge of #7 and #18 into the #13 branch. Makes it so
only the code related to the migrations is needed for review. Originally
I merged them all together to test starting the service in poetry to
make sure the tables are created correctly in Postgres but that made the
PR have a lot of files that were already covered in other PRs. Still the
case with the entities/models directory but couldn't get around that.

## Additions

- Added db_revision folder using alembic init and updated files for SBL
specifics (using user-fi as an example)
- Added migration files using alembic revision for each submission table
(submission, filing, filing_period)
- Updated migration files for table specifics
- Added pytest migrations tests for new tables, columns, and fks
- Had to merge in the entities/models dir since the migrations use the
enumerations

## Removals

-

## Changes

-

## Testing

1. Run pytest in poetry shell (or poetry run pytest) and verify the
tests pass

## Screenshots


## Notes

-

## Todos

-

## Checklist

- [ ] PR has an informative and human-readable title
- [ ] Changes are limited to a single goal (no scope creep)
- [ ] Code can be automatically merged (no conflicts)
- [ ] Code follows the standards laid out in the [development
playbook](https://github.com/cfpb/development)
- [ ] Passes all existing automated tests
- [ ] Any _change_ in functionality is tested
- [ ] New functions are documented (with a description, list of inputs,
and expected output)
- [ ] Placeholder code is flagged / future todos are captured in
comments
- [ ] Visually tested in supported browsers and devices (see checklist
below 👇)
- [ ] Project documentation has been updated (including the "Unreleased"
section of the CHANGELOG)
- [ ] Reviewers requested with the [Reviewers
tool](https://help.github.com/articles/requesting-a-pull-request-review/)
:arrow_right:

## Testing checklist

### Browsers

- [ ] Chrome
- [ ] Firefox
- [ ] Safari
- [ ] Internet Explorer 8, 9, 10, and 11
- [ ] Edge
- [ ] iOS Safari
- [ ] Chrome for Android

### Accessibility

- [ ] Keyboard friendly
- [ ] Screen reader friendly

### Other

- [ ] Is useable without CSS
- [ ] Is useable without JS
- [ ] Flexible from small to large screens
- [ ] No linting errors or warnings
- [ ] JavaScript tests are passing
@jcadam14
Copy link
Contributor

Closing this as the models are merged into main under other stories. As the filing process matures we'll write new stories to capture those updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants