Skip to content

Commit

Permalink
[US_CA] Add missing locations to location metadata (Recidiviz/recidiv…
Browse files Browse the repository at this point in the history
…iz-data#29142)

## Description of the change
Some units only appear in AgentParole, and we want these in our
location_metadata table. Also, I try to select the most recent
ParoleUnit information with the least `nulls` if there are multiple
choices on the most recent date.

Confirmed this fixes the validation in [this
query](https://console.cloud.google.com/bigquery?ws=!1m7!1m6!12m5!1m3!1srecidiviz-staging!2sus-central1!3s9bb0b997-bc31-4b96-b9eb-a678b06f7a11!2e1),
which shows all rows missing in the validation are present in the
updated view.

## Type of change

> All pull requests must have at least one of the following labels
applied (otherwise the PR will fail):

| Label | Description |
|-----------------------------
|-----------------------------------------------------------------------------------------------------------
|
| Type: Bug | non-breaking change that fixes an issue |
| Type: Feature | non-breaking change that adds functionality |
| Type: Breaking Change | fix or feature that would cause existing
functionality to not work as expected |
| Type: Non-breaking refactor | change addresses some tech debt item or
prepares for a later change, but does not change functionality |
| Type: Configuration Change | adjusts configuration to achieve some end
related to functionality, development, performance, or security |
| Type: Dependency Upgrade | upgrades a project dependency - these
changes are not included in release notes |

## Related issues

Closes Recidiviz/recidiviz-data#29021

## Checklists

### Development

**This box MUST be checked by the submitter prior to merging**:
- [ ] **Double- and triple-checked that there is no Personally
Identifiable Information (PII) being mistakenly added in this pull
request**

These boxes should be checked by the submitter prior to merging:
- [ ] Tests have been written to cover the code changed/added as part of
this pull request

### Code review

These boxes should be checked by reviewers prior to merging:

- [ ] This pull request has a descriptive title and information useful
to a reviewer
- [ ] Potential security implications or infrastructural changes have
been considered, if relevant

GitOrigin-RevId: 174a6533dfeeee4c28539bfee2c35e949bea1ed3
  • Loading branch information
not-a-doctor-stromberg authored and Helper Bot committed May 11, 2024
1 parent f07582b commit 2f86156
Showing 1 changed file with 39 additions and 13 deletions.
Expand Up @@ -36,6 +36,41 @@


US_CA_LOCATION_METADATA_QUERY_TEMPLATE = f"""
WITH all_units AS (
-- `all_units` unions all units from AgentParole and PersonParole. For each unit, we
-- use whatever the most recently used ParoleDistrict and ParoleRegion are. If for the
-- same unit within the most recent transfer there entries with different
-- ParoleDistrict and ParoleRegion information, we prioritize non-null information.
-- This only occurs once in AgentParole in 2023, and never occurs in PersonParole.
SELECT * FROM (
SELECT DISTINCT
ParoleUnit,
ParoleDistrict,
ParoleRegion,
update_datetime
FROM `{{project_id}}.{{us_ca_raw_data}}.PersonParole`
UNION DISTINCT
SELECT DISTINCT
ParoleUnit,
ParoleDistrict,
ParoleRegion,
update_datetime
FROM `{{project_id}}.{{us_ca_raw_data}}.AgentParole`
)
QUALIFY ROW_NUMBER() OVER (PARTITION BY ParoleUnit ORDER BY update_datetime DESC, ParoleDistrict NULLS LAST, ParoleRegion NULLS LAST) = 1
),
all_units_nulls_as_string AS (
-- `all_units_nulls_as_string` replaces nulls with the string "NULL". This avoids validation
-- errors down the line.
SELECT
IFNULL(ParoleUnit, 'NULL') AS ParoleUnit,
IFNULL(ParoleDistrict, 'NULL') AS ParoleDistrict,
IFNULL(ParoleRegion, 'NULL') AS ParoleRegion
FROM all_units
)
-- Finally, we build a JSON object for the location metadata out of the ParoleUnit, ParoleDistrict, and ParoleRegion.
SELECT
'US_CA' AS state_code,
UPPER(ParoleUnit) as location_external_id,
Expand All @@ -50,21 +85,12 @@
CASE WHEN SAFE_CAST(ParoleDistrict AS INT64) IS NULL THEN UPPER(ParoleDistrict) ELSE NULL END AS {LocationMetadataKey.SUPERVISION_DISTRICT_NAME.value},
CASE WHEN SAFE_CAST(ParoleRegion AS INT64) IS NULL THEN UPPER(ParoleRegion) ELSE NULL END AS {LocationMetadataKey.SUPERVISION_REGION_NAME.value},
UPPER(ParoleUnit) AS {LocationMetadataKey.SUPERVISION_OFFICE_ID.value},
UPPER(ParoleDistrict) AS {LocationMetadataKey.SUPERVISION_DISTRICT_ID.value},
UPPER(ParoleRegion) AS {LocationMetadataKey.SUPERVISION_REGION_ID.value}
UPPER(ParoleDistrict) AS {LocationMetadataKey.SUPERVISION_DISTRICT_NAME.value},
UPPER(ParoleRegion) AS {LocationMetadataKey.SUPERVISION_REGION_NAME.value}
)
) AS location_metadata,
'SUPERVISION_LOCATION' as location_type
FROM (
SELECT DISTINCT
ifnull(ParoleUnit, 'NULL') AS ParoleUnit,
ifnull(ParoleDistrict, 'NULL') AS ParoleDistrict,
ifnull(ParoleRegion, 'NULL') AS ParoleRegion
FROM `{{project_id}}.{{us_ca_raw_data}}.PersonParole`
-- This qualify statement ensures we select the most recent ParoleDistrict and
-- ParoleRegion for a given ParoleUnit.
QUALIFY ROW_NUMBER() OVER (PARTITION BY ParoleUnit ORDER BY update_datetime desc) = 1
)
'SUPERVISION_LOCATION' AS location_type
FROM all_units_nulls_as_string;
"""

US_CA_LOCATION_METADATA_VIEW_BUILDER = SimpleBigQueryViewBuilder(
Expand Down

0 comments on commit 2f86156

Please sign in to comment.