New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue #1686] load data from legacy (Oracle) tables to staging tables #1852
Conversation
# COUNT has to be a separate query as INSERTs don't return a rowcount. | ||
insert_count = self.db_session.query(select_sql.subquery()).count() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe as a follow-up if we want to improve performance, we could consider getting the count in the same query using RETURNING
https://www.postgresql.org/docs/current/dml-returning.html
Did a little bit of testing locally - seems like its doable:
-- Sequence and defined type
CREATE SEQUENCE IF NOT EXISTS api.my_table_id_seq;
-- Table Definition
CREATE TABLE "api"."my_table" (
"id" int4 NOT NULL DEFAULT nextval('api.my_table_id_seq'::regclass),
"name" text,
PRIMARY KEY ("id")
-- Actual query
with rows as (
INSERT INTO api.my_table (id, name)
VALUES(5, 'bob'), (6, 'joe'), (7, 'steve')
RETURNING
id
)
select count(*) from rows
This gives 3
in the response.
def get_max_opportunity_id(db_session): | ||
max_opportunity_id = db_session.query( | ||
sqlalchemy.func.max(foreign.opportunity.Topportunity.opportunity_id) | ||
).scalar() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I think we should look into for the factories is a way to do this automatically - they mention being able to force the sequence counter based on a startup method in https://factoryboy.readthedocs.io/en/stable/recipes.html#forcing-the-sequence-counter
Basically what you're doing here / what I did in the db-seed-local script, but in the factory itself.
…s to staging tables (#1894) ## Summary Part of #1686 (separate for easier review) ### Time to review: __2 mins__ ## Changes proposed - Add `created_at`, `updated_at`, and `deleted_at` columns to staging tables - Better logging of migrate queries ## Context for reviewers These columns will provide helpful metadata for understanding or troubleshooting the load process. The `created_at` and `updated_at` columns are set automatically by SQLAlchemy. For `deleted_at`, I will update #1852 after this is merged. ## Additional information Example from local testing: ![Screenshot 2024-05-01 at 11 31 47](https://github.com/HHS/simpler-grants-gov/assets/3811269/e9513de8-ceff-42c9-bccc-d2fcfc5fe7bf)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor suggestions, but looks good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
try: | ||
with self.db_session.begin(): | ||
self.load_data_for_table(table_name) | ||
self.db_session.commit() | ||
except Exception: | ||
logger.exception("table load error", extra={"table": table_name}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit isn't necessary as it happens in the exit
part of the begin method.
try: | |
with self.db_session.begin(): | |
self.load_data_for_table(table_name) | |
self.db_session.commit() | |
except Exception: | |
logger.exception("table load error", extra={"table": table_name}) | |
try: | |
with self.db_session.begin(): | |
self.load_data_for_table(table_name) | |
except Exception: | |
logger.exception("table load error", extra={"table": table_name}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rubber stamp to echo Michael's approval 🙏🏼
Summary
Fixes #1686
Time to review: 10 mins
Changes proposed
LoadOracleDataTask
which loops through each legacy table and loads the data into the corresponding staging table.src.data_migration.load.sql
to generate sqlalchemy SQL expressions for the loads.seed_local_legacy_tables.py
to generate fake data in the local legacy tables.Context for reviewers
This is a new batch task which loads data from the legacy (Oracle) tables to staging tables.
It is incremental and handles the following situations:
transformed_at=NULL
last_upd_date
transformed_at=NULL
is_deleted
in destinationis_deleted=TRUE
, resettransformed_at=NULL
last_upd_date
is not newerAdditional information
To run the code locally with fake data: