Irv replications #1

jgibson517 · 2024-04-10T03:14:31Z

This PR has updated code to run the IRV simulations for the Massachusetts' work with votekit.

The python script that handles the actual election runs is simulate.py. I made some tweaks to the .sh files from the Portland files to run things on the cluster, since there is less parameters that vary. I think I updated everything correctly, but let me know if not!

cdonnay

Just some minor changes here or there. @peterrrock2 and I think that this is going to be a very common data pipeline, so we would like to refactor the simulate.py file so that the CLI allows for a more diverse range of inputs, but that isn't necessary for this PR to be approved.

cdonnay · 2024-04-10T15:43:50Z

replication/submit_mass.sh

+                --output="${log_file}" \
+                --error="${log_file}" \
+                $running_script_name \
+                    "$num_seats" \


These inputs to the running script name do not match the order you unpack them in mass_irv.sh

cdonnay · 2024-04-10T15:45:53Z

replication/mass_irv.sh

+#SBATCH --nodes=1
+#SBATCH --cpus-per-task=1
+#SBATCH --ntasks-per-node=1
+#SBATCH --mem=8G


Have you run one of these to confirm that the run time and memory usage are accurate?

cdonnay · 2024-04-10T15:46:21Z

replication/mass_irv.sh

+
+echo --num_districts "$num_districts"
+echo --num_seats "$num_seats"
+echo --cand_split "$even_split"


should be $cand_split

cdonnay · 2024-04-10T16:26:59Z

replication/simulate.py

+                    zone_data[modelname] = []
+                zone_data[modelname].append(
+                    count_winners(winners["ranking"], "D", args.num_seats)
+                )


There is a winners method for ElectionState. This will make lines 122 through 132 a lot clearer. This may require editing the count_winners function.

cdonnay · 2024-04-10T16:27:20Z

replication/simulate.py

+                )
+
+                # save ranking vector from elections
+                zone_data["raw_outputs"].append({modelname: winners["ranking"]})


I would just the use the rankings method of ElectionState here.

cdonnay · 2024-04-10T16:53:07Z

replication/ensembles.py

+    with open('data/good_seed_3.json', 'r') as f:
+        assignment = json.loads(f.read())
+
+    clean_assign = {}


Can you leave a comment as to why this clean_assign is necessary?

peterrrock2

Thank you for getting this done! It looks like most of this will work with the exception of a couple of syntax errors. I also included a couple of comments on things that can improve the readability of this code. @cdonnay also reviewed most of the python sections of this code with me, but I elected to let him comment on the VoteKit stuff since he knows that codebase better. Let me know if you have any questions!

peterrrock2 · 2024-04-10T15:41:25Z

replication/submit_mass.sh

+
+num_districts_array = ("40" "160" "8" "32")
+
+num_seats_array = ("1" "1", "5", "5")


I don't think that the syntax on this is correct. Bash delimits lists by spaces, so this should be

("1" "1" "5" "5)

You should also be able to just treat these as ints, so making it

(1 1 5 5)

Since everywhere else that you call this you are either passing it to another script or you are using a formatted string

peterrrock2 · 2024-04-10T15:45:09Z

replication/submit_mass.sh

+output_dir="mass_output"
+log_dir="mass_logs"
+
+


Can you add a comment here about how these arrays will be related to each other so that when we share the code, people can identify what is going on? (Maybe just a line that says 85 = 40 and 325 = 40 and how these will be used).

peterrrock2 · 2024-04-10T15:47:55Z

replication/mass_irv.sh

@@ -0,0 +1,34 @@
+#!/bin/bash
+
+#SBATCH --time=05:00:00


Will this take 5 hours? It might be good to run an example to check the time first

peterrrock2 · 2024-04-10T15:49:08Z

replication/mass_irv.sh

+#SBATCH --ntasks-per-node=1
+#SBATCH --mem=8G
+#SBATCH --mail-type=NONE
+#SBATCH --mail-user=NONE


It's useful to add your email here so that you get pinged when the jobs finish or fail. You don't have to, but it does save a lot of headache if something goes wrong. I also add a filter on my email for stuff from SLURM so that it does not consume my inbox

peterrrock2 · 2024-04-10T15:49:35Z

replication/mass_irv.sh

+#SBATCH --nodes=1
+#SBATCH --cpus-per-task=1
+#SBATCH --ntasks-per-node=1
+#SBATCH --mem=8G


This looks like a lot of memory to request for this job

peterrrock2 · 2024-04-10T16:53:10Z

replication/ensembles.py

+for node in ma_graph.nodes:
+    geo_id = ma_graph.nodes[node]['GEOID20']
+    for election in elections:
+        ma_graph.nodes[node][election] = ma[ma['GEOID20'] == geo_id][election].iloc[0]


So it looks like you are taking the election information that is stored in the shapefile and add it to the graph here, but this is going to be called every time we call this file. This is fine if we call this once (which I think that we do) but it is not optimal. It would be better for reproducibility/auditing if we were to make the dual graph json file with the election data separate from this script and then just use that file here.

jgibson517 added 3 commits April 9, 2024 18:03

simulation working

9810fb6

updated files

4f951cd

make plans script

9cd0cca

jgibson517 requested a review from peterrrock2 April 10, 2024 03:14

update io

e8dc754

cdonnay requested changes Apr 10, 2024

View reviewed changes

peterrrock2 requested changes Apr 10, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Irv replications #1

Irv replications #1

jgibson517 commented Apr 10, 2024

cdonnay left a comment

cdonnay Apr 10, 2024

cdonnay Apr 10, 2024

cdonnay Apr 10, 2024

cdonnay Apr 10, 2024

cdonnay Apr 10, 2024

cdonnay Apr 10, 2024

peterrrock2 left a comment

peterrrock2 Apr 10, 2024

peterrrock2 Apr 10, 2024

peterrrock2 Apr 10, 2024

peterrrock2 Apr 10, 2024

peterrrock2 Apr 10, 2024

peterrrock2 Apr 10, 2024


		num_districts_array = ("40" "160" "8" "32")

		num_seats_array = ("1" "1", "5", "5")

		output_dir="mass_output"
		log_dir="mass_logs"

Irv replications #1

Are you sure you want to change the base?

Irv replications #1

Conversation

jgibson517 commented Apr 10, 2024

cdonnay left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peterrrock2 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment