Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Look up data sources, loaders by ocd_id in addition to state #213

Open
ghing opened this issue Oct 9, 2014 · 2 comments
Open

Look up data sources, loaders by ocd_id in addition to state #213

ghing opened this issue Oct 9, 2014 · 2 comments

Comments

@ghing
Copy link
Contributor

ghing commented Oct 9, 2014

A friend of a friend wanted help scraping election data for Cook County Illinois. We ultimately wrote it in Ruby because she was a novice programmer and that's what she knew, but the general pattern of defining paths to data, parsing and storing it was the same. This got me thinking, "what would it take to use openelex-core as a framework for writing scrapers for arbitrary jurisdictions?"

One issue with our framework is that it's oriented around U.S. states. So, when we want to fetch results, we run a command like:

inv fetch --state ia

internally, the fetch task looks up the datasource for the state like this:

    state_mod = load_module(state, ['datasource', 'fetch'])
    datasrc = state_mod.datasource.Datasource()

Imagine if we added something like a jurisdiction option for the invoke tasks. Then we could hypothetically do something like this:

inv fetch --jurisdiction "ocd-division/country:us/state:il/county:cook"

The logic would be fairly similar inside the task:

if state:
    jurisdiction = 'ocd-division/country:us/state:{}'.format(state.lower())

jurisdiction_mod = load_module(jurisdiction, ['datasource', 'fetch'])

To make this work, we'd have to develop some kind of registration pattern to map between ocd_ids and Python modules, but this is definitely doable. I could imagine a pretty simple approach where we use our existing logic for discovering states and just add a setting for "contrib modules" that would look something like this:

OPENELEX_JURISDICTION_MODULES = {
   'ocd-division/country:us/state:il/county:cook': 'scrape_cook_county'
}

In this example, scrape_cook_county would be a totally separate package that implements the API that we have defined for states (datasource.py, load.py, etc).

@ghing ghing added the idea label Oct 9, 2014
@ghing ghing added this to the Foundational Ideas milestone Oct 9, 2014
@dwillis
Copy link
Contributor

dwillis commented Oct 10, 2014

+1 to this.

@zstumgoren
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants