Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicates in view shipping.metadata_for_augur_build_v3 #47

Open
kairstenfay opened this issue Feb 20, 2020 · 2 comments
Open

Duplicates in view shipping.metadata_for_augur_build_v3 #47

kairstenfay opened this issue Feb 20, 2020 · 2 comments

Comments

@kairstenfay
Copy link
Contributor

The count of rows in metadata_for_augur_build_v3 is greater than the count of select count(*) from warehouse.sample where identifier is not null. As of now, they're and 34557 and 34468, respectively.

The numbers are the same if you do

select count(distinct(strain)) from shipping.metadata_for_augur_build_v3

There should probably not be duplicates for this view. The following join likely introduces duplicates:

left join shipping.incidence_model_observation_v2 on sample.identifier = incidence_model_observation_v2.sample

(See original Slack conversation for context)

@joverlee521
Copy link
Contributor

Did some more digging into this.
The incidence model observation views seem to have duplicates for encounters that are linked to multiple locations (i.e. both residence and lodging locations) due to this bit:

select encounter_id, hierarchy->'tract' as residence_census_tract
from warehouse.encounter_location
left join warehouse.location using (location_id)
where relation = 'residence'
or relation = 'lodging'

@tsibley
Copy link
Member

tsibley commented Feb 25, 2020

Ah, nice digging. I think the appropriate thing is to be preferring residences but falling back to lodging, so conceptually a coalesce on it (but could be a reducing aggregation in practice).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants