Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Task]: Add New Table Loading Statements #1686

Closed
4 tasks
acouch opened this issue Apr 12, 2024 · 0 comments · Fixed by #1852
Closed
4 tasks

[Task]: Add New Table Loading Statements #1686

acouch opened this issue Apr 12, 2024 · 0 comments · Fixed by #1852
Assignees
Labels
project: grants.gov Grants.gov Modernization tickets

Comments

@acouch
Copy link
Collaborator

acouch commented Apr 12, 2024

Summary

Prerequisite: #1697

The current data load only includes the opportunity table: https://github.com/HHS/simpler-grants-gov/tree/main/api/src/data_migration . This needs to be expanded to include all of the tables for the expanded tables.

This attempt to use an incremental approach using SQL statements. This includes adding new rows, removing delete rows, and updated rows.

This will be a new script with a new class, similar to #1794.

Acceptance criteria

  • Columns that need to be checked for changes are identified
  • New SQL statements add new rows when a source is added
  • New SQL statements remove rows when a source row is removed
  • New SQL statements updates existing rows when data is changed for selected rows
@acouch acouch added the project: grants.gov Grants.gov Modernization tickets label Apr 12, 2024
@acouch acouch changed the title [Task]: Expand Load Step to Include Expanded Data Model [Task]: Add New Table Loading Statements to copy_oracle_data.py and Foreign Tables to setup_foreign_tables.py Apr 12, 2024
@acouch acouch changed the title [Task]: Add New Table Loading Statements to copy_oracle_data.py and Foreign Tables to setup_foreign_tables.py [Task]: Add New Table Loading Statements to copy_oracle_data.py Apr 12, 2024
@acouch acouch changed the title [Task]: Add New Table Loading Statements to copy_oracle_data.py [Task]: Add New Table Loading Statements Apr 23, 2024
@acouch acouch removed the refinement label Apr 23, 2024
@jamesbursa jamesbursa self-assigned this Apr 26, 2024
jamesbursa added a commit that referenced this issue May 1, 2024
jamesbursa added a commit that referenced this issue May 1, 2024
…s to staging tables (#1894)

## Summary
Part of #1686 (separate for easier review)

### Time to review: __2 mins__

## Changes proposed
- Add `created_at`, `updated_at`, and `deleted_at` columns to staging
tables
- Better logging of migrate queries

## Context for reviewers
These columns will provide helpful metadata for understanding or
troubleshooting the load process.

The `created_at` and `updated_at` columns are set automatically by
SQLAlchemy.

For `deleted_at`, I will update #1852 after this is merged.

## Additional information
Example from local testing:
![Screenshot 2024-05-01 at 11 31
47](https://github.com/HHS/simpler-grants-gov/assets/3811269/e9513de8-ceff-42c9-bccc-d2fcfc5fe7bf)
jamesbursa added a commit that referenced this issue May 8, 2024
…#1852)

## Summary
Fixes #1686

## Changes proposed
- Add class `LoadOracleDataTask` which loops through each legacy table
and loads the data into the corresponding staging table.
- Add a module of functions `src.data_migration.load.sql` to generate
sqlalchemy SQL expressions for the loads.
- Add a script `seed_local_legacy_tables.py` to generate fake data in
the local legacy tables.
- Add unit tests.

## Context for reviewers
This is a new batch task which loads data from the legacy (Oracle)
tables to staging tables.

It is incremental and handles the following situations:

| Situation | Detection logic | Action |
| ------------- | ------------- | ------------- |
| New row in source table | Primary key not present in destination table
| Insert into destination, set `transformed_at=NULL` |
| Updated row in source | Primary key exists in both source &
destination AND newer `last_upd_date` | Update in destination, reset
`transformed_at=NULL` |
| Deleted row in source | Primary key does not exist in source, AND not
`is_deleted` in destination | Set `is_deleted=TRUE`, reset
`transformed_at=NULL`
| Unchanged | Primary key exists in both source & destination AND
`last_upd_date` is not newer | No write to destination |

## Additional information
To run the code locally with fake data:

```
make setup-foreign-tables  # once
make seed-local-legacy-tables  # repeat to add & update data
poetry run python3 -m src.data_migration.load.load_oracle_data_task
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
project: grants.gov Grants.gov Modernization tickets
Projects
Development

Successfully merging a pull request may close this issue.

2 participants