Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Soundness bug with connected components #29

Closed
jeremysalwen opened this issue Oct 30, 2020 · 22 comments
Closed

Soundness bug with connected components #29

jeremysalwen opened this issue Oct 30, 2020 · 22 comments

Comments

@jeremysalwen
Copy link

jeremysalwen commented Oct 30, 2020

I am seeing a soundness bug with connected compontents, where it marks the connected components expression and its negation as both true.

Example:

from monosat import *

import numpy as np

def print_districts(districts):
    print("districts:")
    for row in range(districts.shape[0]):
        for col in range(districts.shape[1]):
            for d in range(districts.shape[2]):
                if districts[row, col, d].value():
                    print(f"{d:02}", end=" ")
        print()


def solve_gerrymander(grid, num_districts):

    district_size = grid.shape[0] * grid.shape[1] // num_districts
    if district_size * num_districts != grid.shape[0] * grid.shape[1]:
        print("Error: Need evenly divisible districts")
        return
    exact_victory = np.sum(grid) == (district_size // 2 + 1) * (num_districts // 2 + 1)
    if not exact_victory:
        print(
            "Warning: this solver may be more efficient if the gerrymandering must be solved exactly."
        )

    districts = np.fromfunction(
        np.vectorize(lambda r, c, d: Var()),
        (grid.shape[0], grid.shape[1], num_districts),
        dtype=object)

    graphs = [Graph() for _ in range(num_districts)]
    nodes = np.fromfunction(
        np.vectorize(lambda r, c, d: graphs[d].addNode()),
        (grid.shape[0], grid.shape[1], num_districts),
        dtype=object)

    # Each square is in exactly one district
    for row in range(grid.shape[0]):
        for col in range(grid.shape[1]):
            AssertEqualPB([districts[row, col, d] for d in range(num_districts)], 1)

    # Each district is of the right size.
    for d in range(num_districts):
        squares_contained = [
            districts[row, col, d]
            for row in range(grid.shape[0])
            for col in range(grid.shape[1])
        ]
        AssertEqualPB(squares_contained, district_size)

    for row in range(grid.shape[0]):
        for col in range(grid.shape[1]):
            for dr, dc in [[1, 0], [0, 1]]:
                other_r = row + dr
                other_c = col + dc
                if other_c < 0 or other_r < 0 or other_r >= grid.shape[
                    0] or other_c >= grid.shape[1]:
                    continue
                for d in range(num_districts):
                    edge = graphs[d].addUndirectedEdge(nodes[row, col, d], nodes[other_r, other_c, d])
                    AssertEq(edge, And(districts[row, col, d], districts[other_r, other_c, d]))

    ccs = []
    ccns = []
    # All graphs must have exactly N-district size+1 connected components
    for d in range(num_districts):
        ccs.append(graphs[d].connectedComponentsGreaterThan((num_districts-1) * district_size))
        ccns.append(Not(graphs[d].connectedComponentsGreaterThan((num_districts-1) * district_size)))
        Assert(ccs[-1])
        Assert(ccns[-1])

    # A majority of districts must be won.
    districts_won = []
    for d in range(num_districts):
        votes = []
        for row in range(grid.shape[0]):
            for col in range(grid.shape[1]):
                if grid[row, col]:
                    votes.append(districts[row, col, d])
        if exact_victory:
            # everything must be gerrymandered perfectly to solve.
            AssertOr(EqualPB(votes, 0),
                     EqualPB(votes, district_size // 2 + 1))
        else:
            districts_won.append(GreaterEqPB(votes, district_size // 2 + 1))
    if exact_victory:
        # everything must be gerrymandered perfectly to solve.
        AssertEqualPB(districts_won, num_districts // 2 + 1)
    else:
        AssertGreaterEqPB(districts_won, num_districts // 2 + 1)

    result = Solve()
    if result:
        print("SAT")
        for d in range(num_districts):
            print(d, ccs[d].value())
            print(d, ccns[d].value())
        print_districts(districts)
    else:
        print("UNSAT")


def broken_example():
    grid = np.array([[False, False, False, False, True],
                     [False, False, True, False, True],
                     [True, False, True, False, True],
                     [False, False, False, False, True],
                     [True, False, True, True, True]])
    solve_gerrymander(grid, 5)

Output:

SAT
0 True
0 True
1 True
1 True
2 True
2 True
3 True
3 True
4 True
4 True
districts:
01 00 04 01 04 
01 01 03 02 04 
03 00 03 01 02 
04 00 00 03 04 
02 02 02 00 03 

Note how I assert a constraint and its negation, but it still says SAT, and it evaluates both the constraint and its negation as true (all the "True"s printed out)

@sambayless
Copy link
Owner

sambayless commented Oct 30, 2020 via email

@jeremysalwen
Copy link
Author

I am trying to see if SMT solvers can solve a gerrymandering puzzle. (Basically this example is the entire problem I am trying to solve, just for a larger grid).

I originally encoded my puzzle using Z3, using a variant or the integer node-distance encoding you mentioned for reachability in your thesis (the constraint is slightly more relaxed, so that the distance must be one more than any neighbor rather than the minimum neighbor). I found that Z3 could solve it for small grids (using QF_FD), but once the grids got big, it struggled to even produce any partitioning of the grid into connected districts, even ignoring the gerrymandering constraints. I then tried to research how other people were encoding connectness constraints into SMT solvers and found monosat.

The case I care about (some dynamically selected set of vertices is connected) doesn't map exactly to the predicates provided by monosat, but I thought that using connected components was the most direct implementation (using the fact that non-selected nodes will not be connected at all, thus always forming individual connected components of size 1.) I could also imagine encoding a whole bunch of reachability constraints from every vertex of a district to every other, but that would have a quadratic blowup. Another idea is I could create a graph of size K where K is the size of the district, dynamically assigning each vertex to a given square in the grid based on which squares are contained in the district, and compute reachability between these nodes by adding a bunch of conditions like "If node i is in position (x,y) and node j is in position (x+1,y), they are connected", but this seems like it would have a quadratic blowup as well. I am open to other suggestions for how to encode this problem.

I also would be interested in other suggestions on how to encode connectedness for a general purpose SMT solver, besides the method I am using of encoding "steps from origin" as an integer variable.

@sambayless
Copy link
Owner

What is the maximum number of disconnected components you expect the graph to be partitioned into? Is it 2?

I would consider reachability constraints (which are much more optimized in MonoSAT than the connected component constraints). Reachability predicates sharing the same source node can re-use most of their computations inside the solver, so try to arrange to have as few distinct source nodes in the reachability constraints as possible.

One way I have done this in the past is to add one (or a small number) of extra 'source' nodes, and then use constraints to control which edges from that source node are enabled. This transformation can often allow many distinct source nodes to be replaced with one source node.

Even then, you also want to try to have as few distinct reachability predicates as possible (as a secondary optimization, after minimizing the number of separate source nodes).

However, I think that instead of using reachability constraints, you might want to use maxflow constraints. Maxflow constraints are well optimized in MonoSAT and often better suited for constraints involving many nodes (as opposed to having lots of separate reach predicates).

I haven't tested it, but you may be able to use maxflow to enforce connectivity in the graph, by adding an extra destination node, with edges from each node that should be connected with capacity 1 (edges between nodes should have a large enough capacity that they will never be a bottleneck). Then enforce that the maxflow from source to dest == number of nodes you want in that component.

I'm not completely sure that this will work in your scenario; if you want to ensure multiple connected components you would potentially need one maxflow predicate per connected component.

@jeremysalwen
Copy link
Author

jeremysalwen commented Nov 2, 2020

Oh wow I just realized that actually my constraint could be encoded as just a single connected component count constraint on a single graph. Really I am just trying to partition a graph into N connected subgraphs of equal size. Here is an implementation encoding it that way:

from monosat import *

import numpy as np


def print_districts(districts):
    print("districts:")
    for row in range(districts.shape[0]):
        for col in range(districts.shape[1]):
            for d in range(districts.shape[2]):
                if districts[row, col, d].value():
                    print(f"{d:02}", end=" ")
        print()


def solve_gerrymander(grid, num_districts):

    district_size = grid.shape[0] * grid.shape[1] // num_districts
    if district_size * num_districts != grid.shape[0] * grid.shape[1]:
        print("Error: Need evenly divisible districts")
        return
    exact_victory = np.sum(grid) == (district_size // 2 + 1) * (num_districts // 2 + 1)
    if not exact_victory:
        print(
            "Warning: this solver may be more efficient if the gerrymandering must be solved exactly."
        )

    districts = np.fromfunction(
        np.vectorize(lambda r, c, d: Var()),
        (grid.shape[0], grid.shape[1], num_districts),
        dtype=object)

    graph = Graph()
    nodes = np.fromfunction(
        np.vectorize(lambda r, c: graph.addNode()),
        (grid.shape[0], grid.shape[1]),
        dtype=object)

    # Each square is in exactly one district
    for row in range(grid.shape[0]):
        for col in range(grid.shape[1]):
            AssertEqualPB([districts[row, col, d] for d in range(num_districts)], 1)

    # Each district is of the right size.
    for d in range(num_districts):
        squares_contained = [
            districts[row, col, d]
            for row in range(grid.shape[0])
            for col in range(grid.shape[1])
        ]
        AssertEqualPB(squares_contained, district_size)

    for row in range(grid.shape[0]):
        for col in range(grid.shape[1]):
            for dr, dc in [[1, 0], [0, 1]]:
                other_r = row + dr
                other_c = col + dc
                if other_c < 0 or other_r < 0 or other_r >= grid.shape[
                    0] or other_c >= grid.shape[1]:
                    continue
                edge = graph.addUndirectedEdge(nodes[row, col], nodes[other_r, other_c])
                AssertEq(edge, Or(*[And(districts[row, col, d], districts[other_r, other_c, d]) for d in range(num_districts)]))

    # All num_district districts must be connected.
    Assert(Not(graph.connectedComponentsGreaterThan(num_districts)))

    # A majority of districts must be won.
    districts_won = []
    for d in range(num_districts):
        votes = []
        for row in range(grid.shape[0]):
            for col in range(grid.shape[1]):
                if grid[row, col]:
                    votes.append(districts[row, col, d])
        if exact_victory:
            # everything must be gerrymandered perfectly to solve.
            AssertOr(EqualPB(votes, 0),
                     EqualPB(votes, district_size // 2 + 1))
        else:
            districts_won.append(GreaterEqPB(votes, district_size // 2 + 1))
    if exact_victory:
        # everything must be gerrymandered perfectly to solve.
        AssertEqualPB(districts_won, num_districts // 2 + 1)
    else:
        AssertGreaterEqPB(districts_won, num_districts // 2 + 1)

    result = Solve()
    if result:
        print("SAT")
        print_districts(districts)
    else:
        print("UNSAT")


def broken_example():
    grid = np.array([[False, False, False, False, True],
                     [False, False, True, False, True],
                     [True, False, True, False, True],
                     [False, False, False, False, True],
                     [True, False, True, True, True]])
    solve_gerrymander(grid, 5)

unfortunately this also results in an unsound solution:

SAT
districts:
01 00 01 02 04 
04 00 03 01 04 
03 04 03 01 02 
01 00 00 00 04 
02 03 02 02 03 

I'm not sure exactly how I would encode my constraint using maxflow or reachability constraints. The problem is that there is no specific node that is guaranteed to be in any specific district.

So for reachability, I just see there being a large number of reachability constraints, however I encode it.

For max flow, I would need not only one constraint for each connected component, but also one constraint for each possible location for my source node, again, resulting in a large number of constraints.

The fact that this encodes so nicely as a connected components query makes me think that there probably isn't a better way to encode it. Is there something I can do to debug this failure in monosat, or would it be deep in the internals beyond my comprehension?

@sambayless
Copy link
Owner

sambayless commented Nov 3, 2020 via email

@jeremysalwen
Copy link
Author

Thank you! The code from my previous comment has reduced it down to a single connected components call:

Assert(Not(graph.connectedComponentsGreaterThan(num_districts)))

but unfortunately that was still unsound.

@sambayless
Copy link
Owner

I have a potential bug fix, but its not well tested enough yet to be merged into master.

For now, I'm keeping these changes in a separate connected_components branch.
You can find a test demonstrating the connected component constraints in python here:
https://github.com/sambayless/monosat/blob/connected_components/examples/python/test_connected_components.py

If you have a chance, could you check out the new 'connected_components' branch, recompile, and (critically) reinstall the python bindings with:

sudo python3.8 setup.py install -f

(the -f will ensure that the previous python API will be replaced with the new one).

Both of your test examples are UNSAT in the new branch.

@jeremysalwen
Copy link
Author

jeremysalwen commented Nov 7, 2020

Thanks, this seems to work. I think there is an off by one issue though.

Assert(graph.connectedComponentsLessOrEqualTo(num_districts))

results in UNSAT, but

Assert(graph.connectedComponentsLessOrEqualTo(num_districts+1))

results in a solution that has only num_districts connected components.

@jeremysalwen
Copy link
Author

So I am finding it is struggling (>24 h not solved) to partition a 7x7 grid into 7 equally sized connected components, even if I remove any of the vote counting constraints. (Note that this is relatively easy to solve by splitting into rows or columns). Is this surprising to you?

from monosat import *


import matplotlib.image as mpimg
import numpy as np


def print_districts(districts):
    print("districts:")
    for row in range(districts.shape[0]):
        for col in range(districts.shape[1]):
            for d in range(districts.shape[2]):
                if districts[row, col, d].value():
                    print(f"{d:02}", end=" ")
        print()


def solve_gerrymander(grid, num_districts):

    district_size = grid.shape[0] * grid.shape[1] // num_districts
    if district_size * num_districts != grid.shape[0] * grid.shape[1]:
        print("Error: Need evenly divisible districts")
        return
    exact_victory = np.sum(grid) == (district_size // 2 + 1) * (num_districts // 2 + 1)
    if not exact_victory:
        print(
            "Warning: this solver may be more efficient if the gerrymandering must be solved exactly."
        )

    districts = np.fromfunction(
        np.vectorize(lambda r, c, d: Var()),
        (grid.shape[0], grid.shape[1], num_districts),
        dtype=object)

    graph = Graph()
    nodes = np.fromfunction(
        np.vectorize(lambda r, c: graph.addNode()),
        (grid.shape[0], grid.shape[1]),
        dtype=object)

    # Each square is in exactly one district
    for row in range(grid.shape[0]):
        for col in range(grid.shape[1]):
            AssertEqualPB([districts[row, col, d] for d in range(num_districts)], 1)

    # Each district is of the right size.
    for d in range(num_districts):
        squares_contained = [
            districts[row, col, d]
            for row in range(grid.shape[0])
            for col in range(grid.shape[1])
        ]
        AssertEqualPB(squares_contained, district_size)

    for row in range(grid.shape[0]):
        for col in range(grid.shape[1]):
            for dr, dc in [[1, 0], [0, 1]]:
                other_r = row + dr
                other_c = col + dc
                if other_c < 0 or other_r < 0 or other_r >= grid.shape[
                    0] or other_c >= grid.shape[1]:
                    continue
                edge = graph.addUndirectedEdge(nodes[row, col], nodes[other_r, other_c])
                AssertEq(edge, Or(*[And(districts[row, col, d], districts[other_r, other_c, d]) for d in range(num_districts)]))

    # All num_district districts must be connected.
    Assert(graph.connectedComponentsLessOrEqualTo(num_districts + 1))

    result = Solve()
    if result:
        print("SAT")
        print_districts(districts)
    else:
        print("UNSAT")


def broken_example():
    grid = np.array([[True, True, False, False, False, False, False],
                     [True, False, False, False, True, False, True],
                     [False, False, False, False, False, False, True],
                     [False, False, False, False, True, False, True],
                     [False, False, True, False, True, False, True],
                     [True, False, False, False, False, False, True],
                     [False, False, True, False, True, True, True]])
    solve_gerrymander(grid, 7)

@sambayless
Copy link
Owner

sambayless commented Nov 9, 2020 via email

@jeremysalwen
Copy link
Author

jeremysalwen commented Nov 11, 2020

Thanks for the suggestions. The +1 is indeed to account for the off-by-one error. I created an encoding that doesn't have all this extra symmetry by using a large number of max flow constraints. However, it seems to me there is another soundness error I am hitting again (maybe related to an off by one error, but with max flow constraints?)

The idea I am using is to create a single source node with a directed connection of weight 1 to every grid node. Then I put infinite weight on any of the undirected edges between the grid nodes, and add a constraint to every node that the max flow to that node from the source node is exactly the district size. This seems to somewhat work, but I am seeing unsound solutions. (max flow of 4 when the constraint requires it to be >=5)

Full example below:

from monosat import *

import numpy as np

def print_edges(edges):
    print("edges:")
    for row in range(edges.shape[0]):
        for col in range(edges.shape[1]):
            print("o", end="")
            if edges[row, col, 1] and edges[row, col, 1].value():
                print("-", end="")
            else:
                print(" ", end="")
        print()
        for col in range(edges.shape[1]):
            if edges[row, col, 0] and edges[row, col, 0].value():
                print("|", end="")
            else:
                print(" ", end="")
            print(" ", end="")
        print()


def solve_gerrymander(grid, num_districts):
    district_size = grid.shape[0] * grid.shape[1] // num_districts
    if district_size * num_districts != grid.shape[0] * grid.shape[1]:
        print("Error: Need evenly divisible districts")
        return

    graph = Graph()
    nodes = np.fromfunction(
        np.vectorize(lambda r, c: graph.addNode()),
        (grid.shape[0], grid.shape[1]),
        dtype=object)

    root_node = graph.addNode()
    np.vectorize(lambda node: AssertTrue(graph.addEdge(root_node, node, 1)))(nodes)

    edges = np.fromfunction(
        np.vectorize(lambda r, c, x: None),
        (grid.shape[0], grid.shape[1], 2),
        dtype=object)
    for row in range(grid.shape[0]):
        for col in range(grid.shape[1]):
            for i, (dr, dc) in enumerate([[1, 0], [0, 1]]):
                other_r = row + dr
                other_c = col + dc
                if other_c < 0 or other_r < 0 or other_r >= grid.shape[
                    0] or other_c >= grid.shape[1]:
                    continue
                edges[row, col, i] = graph.addUndirectedEdge(nodes[row, col], nodes[other_r, other_c], 10*district_size*district_size)
            # Each component must have num_districts elements
            Assert(graph.maxFlowGreaterOrEqualTo(root_node, nodes[row, col], district_size))
            Assert(Not(graph.maxFlowGreaterOrEqualTo(root_node, nodes[row, col], district_size+2)))

    result = Solve()
    if result:
        print("SAT")
        print_edges(edges)
    else:
        print("UNSAT")


def broken_example():
    grid = np.array([[True, True, False, False, False, False, False],
                     [True, False, False, False, True, False, True],
                     [False, False, False, False, False, False, True],
                     [False, False, False, False, True, False, True],
                     [False, False, True, False, True, False, True],
                     [True, False, False, False, False, False, True],
                     [False, False, True, False, True, True, True]])
    solve_gerrymander(grid[-5:, -5:], 5)

broken_example()

Output:

SAT
edges:
o-o-o-o o 
        | 
o o-o-o-o 
|         
o-o o o-o 
|   | |   
o o-o o-o 
| |   | | 
o o-o o o   

Note that the upper left component of size four despite a constraint of max flow >=5. (Also note that I am adding the constraint that max flow <7 instead of <6, because it is harder to solve with the constraint that maxflow exactly equals 5, and the solution is still unsound with this constraint).

@sambayless
Copy link
Owner

I've attached the solution in the underlying graph that the solver finds (see below for instructions on printing this graph yourself). Based on your description, my understanding is that node 'n5' corresponds to position row0, col4. The root node is node n26.

(In this graph, blue edges are assigned 'true', red are assigned 'false')
gerrymander_maxflow

You can produce this yourself using the "-print-graph" in monosat, but you will need to git pull and recompile first (I just pushed a change to re-enable this diagnostic). To show the full graph that monosat finds, add the following line somewhere near the top of your python file, before any monosat constraints are created:

Monosat().newSolver(arguments="-print-graph")

The graph will print as text to stdout; you can render it using the graphviz tool 'dot' (available in most linux distributions and brew on mac).

To also show the internal flow that the solver is assigning to each edge for each individual maxflow constraint, set -print-theory-debug:

Monosat().newSolver(arguments="-print-graph -print-theory-debug"):

Graph 1 maxflow 26 to 5 is 5
Graph 1 maxflow 26 to 5 assigns edge 26 -> 5 flow 1
Graph 1 maxflow 26 to 5 assigns edge 10 -> 5 flow 4
Graph 1 maxflow 26 to 5 assigns edge 26 -> 7 flow 1
Graph 1 maxflow 26 to 5 assigns edge 26 -> 8 flow 1
Graph 1 maxflow 26 to 5 assigns edge 7 -> 8 flow 1
Graph 1 maxflow 26 to 5 assigns edge 26 -> 9 flow 1
Graph 1 maxflow 26 to 5 assigns edge 8 -> 9 flow 2
Graph 1 maxflow 26 to 5 assigns edge 26 -> 10 flow 1
Graph 1 maxflow 26 to 5 assigns edge 9 -> 10 flow 3

That assignment looks to me like it does respect the maxflow constraint: 5 flow reaches node '5' (row 0, col 4); 1 flow reaches it directly from the root node, while 4 additional flow reaches the node from the rest of the graph.

@sambayless
Copy link
Owner

About the off by one error in the connected component constraints: If you have a chance, could you create a small test demonstrating it?

@jeremysalwen
Copy link
Author

Hi Sam, thank you for the helpful debug options! It turns out the problem was with numpy vectorize adding extra edges to my graph: numpy/numpy#8758 (you can see it in the above graph,with two connections from node 26 to node 1) I believe this was also causing the "off by one" issue I was seeing.

I am able to solve the problem for some graph sizes now, but I am hitting an internal crash sometimes for specific problem sizes. I am using a large number of maxflow constraints inside a pseudoboolean constraint Here is the smallest size I was able to reproduce the crash with:

from monosat import *

import numpy as np


def solve_gerrymander(grid, num_districts):
    district_size = grid.shape[0] * grid.shape[1] // num_districts
    if district_size * num_districts != grid.shape[0] * grid.shape[1]:
        print("Error: Need evenly divisible districts")
        return

    graph = Graph()
    nodes = np.fromfunction(
        np.vectorize(lambda r, c: graph.addNode(), otypes=[object]),
        (grid.shape[0], grid.shape[1]),
        dtype=object)

    membership_root_node = graph.addNode()
    np.vectorize(lambda node: AssertTrue(graph.addEdge(membership_root_node, node, 1)), otypes=[object])(nodes)

    vote_root_node = graph.addNode()
    for row in range(grid.shape[0]):
        for col in range(grid.shape[1]):
            if grid[row, col]:
                AssertTrue(graph.addEdge(vote_root_node, nodes[row, col], 1))

    edges = np.fromfunction(
        np.vectorize(lambda r, c, x: None),
        (grid.shape[0], grid.shape[1], 2),
        dtype=object)

    districts_won = []
    for row in range(grid.shape[0]):
        for col in range(grid.shape[1]):
            for i, (dr, dc) in enumerate([[1, 0], [0, 1]]):
                other_r = row + dr
                other_c = col + dc
                if other_c < 0 or other_r < 0 or other_r >= grid.shape[0] or other_c >= grid.shape[1]:
                    continue
                edges[row, col, i] = graph.addUndirectedEdge(nodes[row, col], nodes[other_r, other_c],
                                                             10 * district_size * district_size)
            # Each component must have num_districts elements
            Assert(graph.maxFlowGreaterOrEqualTo(membership_root_node, nodes[row, col], district_size))
            Assert(Not(graph.maxFlowGreaterOrEqualTo(membership_root_node, nodes[row, col], district_size + 1)))
            districts_won.append(graph.maxFlowGreaterOrEqualTo(vote_root_node, nodes[row, col], district_size // 2 + 1))
    
    # Must have won at least half of the districts?
    AssertGreaterEqPB(districts_won, (num_districts // 2 + 1) * district_size)

    result = Solve()
    if result:
        print("SAT")
    else:
        print("UNSAT")


def broken_example():
    grid = np.array([[True, True, False, False, False, False, False],
                     [True, False, False, False, True, False, True],
                     [False, False, False, False, False, False, True],
                     [False, False, False, False, True, False, True],
                     [False, False, True, False, True, False, True],
                     [True, False, False, False, False, False, True],
                     [False, False, True, False, True, True, True]])
    solve_gerrymander(grid[-4:, -4:], 4)

Output:

terminate called after throwing an instance of 'std::runtime_error'
  what():  Internal error in graph enqueue

@sambayless
Copy link
Owner

Thank you for continuing to provide small test cases, they are incredibly helpful.

I've reproduced this crash locally, and then reduced this down to the attached 'GNF' format constraints, which are a subset of the concrete constraints generated in MonoSAT by running your script that are sufficient to crash the solver.

Is it ok with you if I include these and similar GNF format constraints generated from your examples as part of the monosat regression test suite, https://github.com/sambayless/monosat_tests, to catch similar errors in the future? These wont include your python code, just the generated constraints. They would need to be MIT licensed.

gerrymander_crash_reduced.zip

@sambayless
Copy link
Owner

I have a candidate fix that you can try (in the connected_components branch).

@jeremysalwen
Copy link
Author

Yes, I release this code and any derived files under the MIT license (copyright assigned to Google, not me though).

@jeremysalwen
Copy link
Author

Tried out the new changes, it seems like it's working successfully now with the max flow constraints!

I also found that adding an additional constraint (that if two adjacent nodes are reachable from each other, then they must be connected directly) helped to speed up the solving. It was able to solve a 7x7 grid in a minute or less, but it still hasn't been able to solve any of the larger grid sizes yet.

I think an interesting thing about this encoding (using max flow to encode district size and vote counts, with a "fully connected districts" constraint) is that there are no unnecessary degrees of freedom, i.e. each satisfying assignment corresponds to exactly one possible districting. However, I am still not so happy that there are such a huge number of max flow constraints, and I wonder if there is a better way to encode it with a smaller number of constraints (although maybe that doesn't really matter for solving performance?)

Thanks again for all the help!

@sambayless
Copy link
Owner

Each graph predicate is relatively expensive. In general, you want to minimize the number of graph predciates wherever possible; and after having removed as many as you can, try to force as many of the remaining ones to constant 'true' or 'false' as you can get away with. MonoSAT is pretty heavily optimized for the case where there is exactly one graph predicate, and it is assigned to constant 'true' or 'false'.

Do you expect your constraints to be SAT or UNSAT? Sounds like 'SAT' is the expected outcome. In that case, try setting the '-decide-theories' option, which enables theory specific decision heuristics. These can cause a huge speed up on satisfiable instances (for reachability and maxflow predicates), but can also cause pathologically bad behavior for some common use cases, so they aren't on by default.

Have you considered trying a CEGAR loop ("counter-example guided abstraction refinement"). CEGAR is the goto technique for improving scalability in constraint solvers. When you have a 'large' constraint system that you want to solve, but which is too expensive to solve all at once, then you can start with an initially underconstrained (and easy to solve) version of the constraints, and then, each time the solver produces a solution, bring in additional constraints as needed to refute the SAT solvers solution (if that solution does not actually satisfy your requirements).

MonoSAT is an incremental solver, which allows it to implement CEGAR loops efficiently. From python, what you can do is (roughly):

  1. Build the full graph, with all edges and nodes defined up front.
  2. Initially enforce some but not all constraints. You have to use some judgement here. One option would be to enforce everything except the graph predicates. Another option would be to enforce the graph constraints, and leave the non-graph constraints unimplemented.
  3. Use CEGAR loop to gradually bring in the remaining constraints lazily:
while (solve()):
    if (solutionIsValid()):
          return; // we're done
    else:
        // find one or more constraints from the 'full' set of constraints that are sufficient to block the current solution
         strengthenConstraints()

That strengthenConstraints() implementation can be easy or hard, depending on how challenging it is to automatically identify which additional constraints would block the current solution.

@jeremysalwen
Copy link
Author

Thanks for the suggestions!

I am definitely expecting "SAT" as the outcome. I have tried using -decide theories in a few different contexts, but it doesn't seem to speed things up in general.

I have also tried a few different variations of a CEGAR loop, without much success. In the process, I also learned that even solving a single graph constraint (that is trivially solvable by hand, like max flow to some node == 7) can be slow for monosat, probably related to the limitations of the structure of the constraint you mentioned above.

Also is it possible that graph constraints are not solved incrementally? it seems like adding a constraint that is already satisfied causes monosat to generate a completely new solution usually.

@sambayless
Copy link
Owner

sambayless commented Dec 8, 2020 via email

@sambayless
Copy link
Owner

I'm going to close this issue since we addressed the original correctness problem, but I'd love to keep brainstorming approaches to modelling this gerrymandering problem. If you want to chat more about it, could you open a new issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants