Add Common Optimization Methods #361

jenni-niels · 2021-07-29T14:21:40Z

This PR adds common optimization methods to the gerrychain codebase.
The SingleMetricOptimizer class represents the class of optimization problems over a single plan metric and currently implements short bursts, a few variants, and tilted runs, with more to come.

The Gingleator class is a subclass of SingleMetricOptimizer and can be used to search for plans with increased numbers of Gingles' districts.

…bursts and variants as well as tilted runs.

…ased number of Gingles' districts; add further documentation SingleMetricOptimizer class methods; delint optimization.py

pizzimathy · 2021-07-29T22:15:21Z

I have no comments (for now) other than Gingleator is a great class name

gabeschoenbach

I think this is a great jumping-off point for us to bring short bursts into harmony with guided acceptance functions!

I like the flexibility of thresholding our searches — I'm going to keep saying "majority" (i.e. threshold = 0.50) for simplicity but this should all work if the threshold is set to something else). But I think we should broaden the gingleator initialization to be able to search for majority-minority or majority-party districts. This flexibility is sketched out here and here in my One-Click-Chains repo, though admittedly not in an "object-oriented" format. While broadening to partisan stuff makes the code a little less straightforward, I think we could gain that back by altogether dropping the minority_perc_col and just expect to be passed either a Tally updater name for the demographic group (along with a Tally updater for the total population) or an ElectionResults updater name that we can use .percents(party) to query for the party percents.

Conceptually, one thing all of the score functions in the gingleator class have in common is that they look built to be used as simple comparators as you traverse the chain, i.e. accept the child if there is an improvement, otherwise accept with some fixed probability p. I think we should build out the functionality to make p dynamically change depending on how much worse the proposal is than its parent. In gingleator terms, this means a helper function like my get_majdistricts_info() function that returns the number of majority-{group} (demographic group or party) districts, and the percentages of a) the smallest district above the threshold and b) the largest district below the threshold. Then, in SingleMetricOptimizer we can build acceptance functions that use those helpers to cleverly accept with a variable p. As I see it, this would be an extension/improvement to your tilted_short_bursts().

This is really exciting — I think that if we build this right we can be really flexible in how we search through the metagraph, and I would love to run experiments to see if layering all of these tricks together gives us more of a leg up (variable length short bursts with a custom acceptance function that rejects worse plans proportional how bad they are?? could be huge)...

Small bug, I think:

        if minority_perc_col is None:
            perc_up = {min_perc_column_name:
                            lambda part: {k: part[minority_pop_col][k] / part[total_pop_col][k]
                                          for k in part.parts.keys()}}
            initial_state.updaters.update(perc_up)

        score = partial(score_function, minority_perc_col=minority_perc_col, threshold=threshold)

        super().__init__(proposal, constraints, initial_state, score, minmax="max",
                         tracking_funct=tracking_funct)

    """
    Score Functions
    """

    @classmethod
    def num_opportunity_dists(cls, part, minority_perc_col, threshold):
        """
        Given a partition, returns the number of opportunity districts.
        :param `part`: Partition to score.
        :param `minority_perc_col`: Which updater is a mapping of district ids to the fraction of
            minority popultion within that district.
        :param `threshold`: Beyond which fraction to consider something a "Gingles" 
            (or opportunity) district.
        :rtype int
        """
        dist_percs = part[minority_perc_col].values()
        return sum(list(map(lambda v: v >= threshold, dist_percs)))

...if minority_perc_col is None then the updater that maps district IDs to the fraction of minority population will be called minority_perc_column_name. But the score functions all seem to call partition[minority_perc_col] which seems like it would return an error in this case.

…partitions and to store rolling best partition/score as instance variables.

gabeschoenbach · 2022-03-01T18:18:07Z

This looks really great! Just to check my understanding — if we called a simulated annealing run with a beta_function as something like:

gingles.hot_cold_cycle_beta_function_factory(0,1000)

(in other words only ever cold), would this be equivalent to a tilted run that always accepts better partitions, and accepts worse partitions with a dynamic probability p that depends on the beta_magnitude?

I made some small changes in docstrings, mostly just fixing some typos. I also want to flag a couple spots I think the documentation is unclear — might just be me, so would love to get other folks' input as well...

SingleMetricOptimizer optimizer.py, lines 12-25
I would change In instance of this class encapsulates the dualgraph and updaters via the initial partition to This class includes the initial partition (which gives access to the underlying dual graph and updaters).... I'm a little confused by Note that these are reset every time an optimization run is invoked and do not persist, but I'm not sure whether/how to reword that.

hot_cold_cycle_beta_function_factory optimizer.py, lines 137-144
I wonder if we can think of a more concise name for this function? And maybe add one more sentence of explanation as to its use, although I suppose this is pretty clear if you read the args on lines 140-141...

The Optimization notebook looks great! I made a small change to increase the size of the traceplots, so it's easier to see the differences in the different chains. I think it could be useful to add a little bit more documentation to the cells of the notebook, so someone could understand how the various functions work without going to the documentation. This is something I could add if you don't have the bandwidth!

codecov-commenter · 2022-03-01T18:23:24Z

Codecov Report

Attention: Patch coverage is 0% with 170 lines in your changes are missing coverage. Please review.

Project coverage is 80.14%. Comparing base (f2b1acd) to head (665d0ae).

❗ Current head 665d0ae differs from pull request most recent head f4725de. Consider uploading reports for the commit f4725de to get more accurate results

Additional details and impacted files

@@             Coverage Diff             @@
##             main     #361       +/-   ##
===========================================
- Coverage   91.91%   80.14%   -11.77%     
===========================================
  Files          38       40        +2     
  Lines        1942     1894       -48     
===========================================
- Hits         1785     1518      -267     
- Misses        157      376      +219

Files	Coverage Δ
gerrychain/optimization/__init__.py	`0.00% <0.00%> (ø)`
gerrychain/optimization/gingleator.py	`0.00% <0.00%> (ø)`
gerrychain/optimization/optimization.py	`0.00% <0.00%> (ø)`

... and 35 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f2b1acd...f4725de. Read the comment docs.

jenni-niels · 2022-03-01T18:43:28Z

Thanks for the review! (as well as typo catching - spelling is not my strong suit)
I'll work on clarifying the documentation in the places you mentioned.

This looks really great! Just to check my understanding — if we called a simulated annealing run with a beta_function as something like:
gingles.hot_cold_cycle_beta_function_factory(0,1000)
(in other words only ever cold), would this be equivalent to a tilted run that always accepts better partitions, and accepts worse partitions with a dynamic probability p that depends on the beta_magnitude?

Yes that would be equivalent to a titled run with a dynamic probability of excepting worst scoring plans. Although, it might be simpler to simply call a simulated annealing run with beta function:

beta_function = lambda _: 1

which has slightly less computational overhead that overloading the gingles.hot_cold_cycle_beta_function_factory method.

The Optimization notebook looks great! I made a small change to increase the size of the traceplots, so it's easier to see the differences in the different chains. I think it could be useful to add a little bit more documentation to the cells of the notebook, so someone could understand how the various functions work without going to the documentation. This is something I could add if you don't have the bandwidth!

Yes I can add some more context/docs to the notebook! I'd like to expand on the pros/cons of the different optimization methods, although that might take way longer runs to show in a plot so I'm not sure if an example notebook is the place for that code. I also think it might be useful to show the usage of the SingleMetricOptimization class beyond the gingleator use case. Perhaps seeking maps that are close to aggregate proportionality or some other target?
Thoughts?

gabeschoenbach · 2022-03-01T18:52:52Z

Definitely agree it would be good to show SingleMetricOptimization for other things — aggregate proportionality would make sense. I also was thinking it would be cool to compare/contrast all these different methods, but I think that would be best in a different file, maybe not necessarily an intro notebook. If we do do that comparison, one thing I'd love to try is just optimizing for cut edges, since for large graphs its a pretty granular metric and it would be easy to see change over time.

…xpose `best_part`, `best_score`, and `score` as readonly properties. Add stubs for new cycling beta functions.

… with context.

gabeschoenbach · 2022-03-08T22:54:58Z

Looks good! I just updated some stuff in the Optimization notebook so the annealing calls work with the new jumpcycle function.

This was fixed, but github will not show the comment in the code for me to resolve the conversation, so I have to do this the long way

Create SingleMetricOptimization class; add implementations for short …

ea7fe20

…bursts and variants as well as tilted runs.

jenni-niels added the work-in-progress label Jul 29, 2021

Add Gingleator child class of SingleMetricOptimizer for finding incre…

1b25630

…ased number of Gingles' districts; add further documentation SingleMetricOptimizer class methods; delint optimization.py

Add more docs.

bdb2b01

jenni-niels requested a review from gabeschoenbach November 5, 2021 14:17

gabeschoenbach previously requested changes Nov 5, 2021

View reviewed changes

jenni-niels added 6 commits February 23, 2022 13:11

Merge branch 'main' into optimizations

9707760

Refactor SingleMetricOptimizer class to have generator function over …

282559a

…partitions and to store rolling best partition/score as instance variables.

Add progressbar options to SingleMetricOptimizer class.

49c3711

add example notebook; patch progress bar

0043af4

Add simulated annealing functionality

f414214

Update docs to bring inline with partition generator refactoring effort.

568b157

jenni-niels removed the work-in-progress label Feb 28, 2022

jenni-niels and others added 5 commits February 28, 2022 17:39

Delint

1cebba5

minor typo fixes

24260a2

last few typo fixes

0e17f4f

one more typo

ec935fe

increase plt size

665d0ae

jenni-niels requested a review from gabeschoenbach March 1, 2022 18:46

jenni-niels and others added 3 commits March 1, 2022 14:36

clear up class summary docs; "privatize" all instance variables and e…

3c2f40b

…xpose `best_part`, `best_score`, and `score` as readonly properties. Add stubs for new cycling beta functions.

Add minimizing compactness example to notebook and add markdown cells…

e0b188f

… with context.

fix some typos and jumpcycle call

f1a233f

pjrule mentioned this pull request Feb 7, 2023

Remembering previous partitions #263

Closed

pjrule added the summer-project Summer projects for 2023 and beyond label Apr 25, 2023

pjrule added the work-in-progress label Apr 25, 2023

cdonnay added 2 commits March 29, 2024 15:12

Merge remote-tracking branch 'upstream/main' into optimizations

9e35545

fix random module

f4725de

peterrrock2 changed the base branch from main to dev/0.3.2 April 12, 2024 19:09

peterrrock2 merged commit 2fc3d1e into mggg:dev/0.3.2 Apr 12, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Common Optimization Methods #361

Add Common Optimization Methods #361

jenni-niels commented Jul 29, 2021

pizzimathy commented Jul 29, 2021

gabeschoenbach left a comment

gabeschoenbach commented Mar 1, 2022

codecov-commenter commented Mar 1, 2022 •

edited by codecov bot

jenni-niels commented Mar 1, 2022

gabeschoenbach commented Mar 1, 2022

gabeschoenbach commented Mar 8, 2022

Add Common Optimization Methods #361

Add Common Optimization Methods #361

Conversation

jenni-niels commented Jul 29, 2021

pizzimathy commented Jul 29, 2021

gabeschoenbach left a comment

Choose a reason for hiding this comment

gabeschoenbach commented Mar 1, 2022

codecov-commenter commented Mar 1, 2022 • edited by codecov bot

Codecov Report

jenni-niels commented Mar 1, 2022

gabeschoenbach commented Mar 1, 2022

gabeschoenbach commented Mar 8, 2022

codecov-commenter commented Mar 1, 2022 •

edited by codecov bot