Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COI preservation updater #9

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

COI preservation updater #9

wants to merge 5 commits into from

Conversation

pjrule
Copy link
Contributor

@pjrule pjrule commented Dec 4, 2021

This PR adds a more polished version of the COI preservation calculations over in coi-states to the evaluation suite.

TODO

  • More tests
  • New variant (fractional scores)
  • Extra documentation for data sources?

Usage

The COI preservation updater assumes that COIs (or COI aggregations, a.k.a. geoclusters) and dual graph units can be (approximately) represented with a common block unit. Typically, this common unit is the 2020 U.S. Census block. Updaters are specialized to a particular set of COIs and a particular dual graph.

Example: preservation of Wisconsin geoclusters

Suppose wisconsin_clusters.csv is a table of geoclusters generated by the pipeline in coi-states. We can load 2020 Census block approximations of the geoclusters as follows:

import pandas as pd
from ast import literal_eval  

clusters_df = pd.read_csv('wisconsin_clusters.csv').set_index('id')
clusters_df['blocks_2020'] = clusters_df['blocks_2020'].apply(literal_eval)
coi_blocks = {coi: set(blocks) for coi, blocks in clusters_df['blocks_2020'].items()}

(To compute the preservation of individual COI submissions instead of geoclusters, simply swap in a submission-level dataset with a blocks_2020 column.)

An exact correspondence between 2020 Census blocks and 2020 Census VTDs can similarly be loaded from the official Census block assignment files (BAFs). We expect that the same node identifier is used in the dual graph and in the VTD-block calculations, so it may be necessary to map between Census GeoIDs and node indices.

from gerrychain import Graph
from collections import defaultdict

graph = Graph.from_json('wi_vtds_0_indexed.json')
geoid_to_node_index = {v: k for k, v in graph.nodes('GEOID20')}

vtd_block_df = pd.read_csv('BlockAssign_ST55_WI_VTD.txt', sep='|', dtype=str).set_index('BLOCKID')
vtd_block_df['vtd_id'] = '55' + vtd_block_df['COUNTYFP'].str.zfill(3) + vtd_block_df['DISTRICT'].str.zfill(6)
vtd_blocks = defaultdict(set)
for block, geoid in vtd_block_df['vtd_id'].items():
  vtd_blocks[geoid_to_node_index[geoid]].add(block)

Block total populations (P1_001N) can be retrieved via the Census API.

from census import Census

pl_client = Census(None).pl  # as of now, we can get away without a (free) Census API key
block_pop_df = pd.DataFrame(pl_client.get(['P1_001N'], {'for': 'block: *', 'in': 'state:55 county:*'}))
block_pop_df['GEOID20'] = (
  block_pop_df['state'].astype(str) +
  block_pop_df['county'].astype(str).str.zfill(3) + 
  block_pop_df['tract'].astype(str).str.zfill(6) + 
  block_pop_df['block'].astype(str).str.zfill(4)
)
block_pop_df = block_pop_df.set_index('GEOID20')
block_pops = dict(block_pop_df['P1_001N'])

Then, we can generate a COI preservation updater over a range of preservation thresholds:

from evaltools.evaluation import block_level_coi_preservation

score_fn_partial_dists = block_level_coi_preservation(
    unit_blocks=unit_blocks,
    coi_blocks=coi_blocks,
    block_pops=block_pops,
    thresholds=(0.75, 0.8, 0.85, 0.9, 0.95),
    partial_districts=True)

* Implements partial districts variant
* Adds another test (for the case where COI size > district size)
* Refines documentation
@InnovativeInventor InnovativeInventor changed the title [WIP] COI preservation updater COI preservation updater Dec 10, 2021
Copy link
Member

@InnovativeInventor InnovativeInventor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Not sure if the numpy stuff is entirely needed, though (pandas may be sufficient for this).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants