Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oshpd-ca hospitals missing #11

Open
dcldmartin opened this issue Feb 25, 2019 · 1 comment
Open

oshpd-ca hospitals missing #11

dcldmartin opened this issue Feb 25, 2019 · 1 comment

Comments

@dcldmartin
Copy link

Interesting project! I'm looking at the OSHPD CA data. The records appear to list 354 distinct hospital_id values, but there are only 130 in the combined TSVs of the latest data (data-latest-1.tsv and data-latest-2.tsv).

I'll look into the parser and submit PRs if I find anything, but do you have any thoughts?

@vsoch
Copy link
Owner

vsoch commented Feb 25, 2019

This particular institution took weeks for me to parse, and I skipped over a large chunk of the files because the data was incomplete. Take a look at the script to see the logic (for example, I skipped over the files that have _All) and please open a PR if you are able to parse missing files. Importantly, something to keep in mind - the limit of Github file sizes is 100MB, and right now (with missing files) we already hit that barrier with data-latest-2.tsv. Since the original two files are written based on specific indices and sizes with a subset of skipped flies, updating the data isn't as trivial as adding more parsers to the list because the sizes would then be off. Thus, if you write another parser, we would want to add logic to match (another set of patterns) to write to data-latest-3.tsv. The alternative is to do the whole thing over and measure the file size as you go, but I doubt you have the weeks or patience to do that :P

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants