Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregator script not completing on UCLA-only dataset for 4/1 #6

Open
cawarren opened this issue May 8, 2020 · 1 comment
Open

Comments

@cawarren
Copy link
Member

cawarren commented May 8, 2020

(Note: Ping Andrew for the aggregator version and datasets to test against.)

Only the following two datasets include data prior to 4/15:

  • UCLA
  • UCLA + CSG Adjustments

Without the CSG data, the aggregator script doesn't complete, because it sees no usable data for 4/1 and so assumes it's at the end of the data export date. However, there is data for 4/1, and the data has facility names which are in our mapping spreadsheet - so it's not clear why it's not recognizing them as valid data rows to merge.

Repro:

  • Get the ZIP'ed copy from Andrew
  • Delete the CSG dataset
  • Attempt to run the script - note the script completes successfully, lists the correct number of rows for each input dataset, but only produces an output dataset of facilities from 3/31.
@cawarren
Copy link
Member Author

cawarren commented May 9, 2020

I think I've sorted this out. Believe that it was due to non-printing unicode chars, and that this was mixing up several dozen mappings.

In my local copy I've updated the get_lookup_key method in data.py` to the following:

import unicodedata

...

  @staticmethod
  def _get_lookup_key(state, name):
    printable = set(['Lu', 'Ll', 'Nd', 'Pc', 'Pd', 'Po'])
    
    state = ''.join(c for c in state.strip().lower() if unicodedata.category(c) in printable)
    name = ''.join(c for c in name.strip().lower() if unicodedata.category(c) in printable)

    return '%s:%s' % (state, name)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant