Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible 85-95% reduction in stage 2 reweighting time using drop-in-replacement solver in solver.jl #381

Open
donboyd5 opened this issue May 19, 2021 · 11 comments

Comments

@donboyd5
Copy link

donboyd5 commented May 19, 2021

I don't know if others have found reweighting to be as slow as I have, but stage 2 reweighing for CPS for 17 years took more than 14 hours on an older but still reasonably fast CPU. That's more time than I can tie a computer up for ordinarily.

As some of us have discussed, this really shouldn't take more than half a minute or so a year with an NLP solver and slight problem restructuring.

But in this case I was looking for a pure LP solution that required no restructuring.

After looking over the list of available JuMP LP solvers, I decided to investigate Tulip (here and here).

After installing, it can be dropped into solver.jl with just the following code changes:

# Add Tulip to the packages used
using JuMP, Cbc, NPZ, Tulip  # drop Cbc if done testing
...

# Replace these next 2 lines:
# model = Model(Cbc.Optimizer)
# set_optimizer_attribute(model, "logLevel", 1)

# With these 3 lines:
model = Model(Tulip.Optimizer)  # djb
set_optimizer_attribute(model, "OutputLevel", 1)  # 0=disable output (default), 1=show iterations	
set_optimizer_attribute(model, "IPM_IterationsLimit", 500)  # default 100
...

# Add these 2 lines to see termination status:
st = termination_status(model)
println("Termination status: $st")
...

Here are summary results for 2014. As you can see Cbc takes a quarter-million iterations. For Tulip, I raised the default limit on number of iterations from 100 to 500 so that the solution values are almost identical to those from Cbc for 2014:

Objective function:
Cbc (taxdata): 34,003.997
Tulip (500 iter) 34,004.8231
Tulip (100 iter) 34,027.7230

Iterations:
Cbc (taxdata): 253,373
Tulip (500 iter) 500
Tulip (100 iter) 100

Solution time (minutes):
Cbc (taxdata): 48.8
Tulip (500 iter) 6.4
Tulip (100 iter) 1.3

Quantiles of the key result (ratio of new weight to old weight), (0, .1, .25, .5, .75, .9, 1):
Cbc: [0.3, 1. , 1. , 1. , 1. , 1.7]
Tulip (500 iter) [0.3 0.99999999, 1. , 1. 1. , 1.7]
Tulip (100 iter) [0.30000021, 0.99998908, 0.99999819, 1. , 1.00000274, 1.69999996]

Correlations of the key result (ratio of new weight to old weight):
Cbc compared to Tulip with 500 iterations:
array([[1. , 0.9987413],
[0.9987413, 1. ]])

Cbc compared to Tulip with 100 iterations:
array([[1. , 0.99372129],
[0.99372129, 1. ]])

I did this for 2015, also, and results were similar. I have not looked at other years.

Tulip does not use a lot of memory, so that will not be a consideration on most machines.

Conclusions
It seems pretty clear to me that you can have a massive speedup with a near-drop-in-replacement and it might be worth considering. Tulip did not reach optimality at 500 iterations but the objective function was only 0.002% higher than that of Cbc. You might be able to get Tulip closer to the Cbc optimal result by varying one or more options, but it doesn't seem worth a lot of effort. Personally, I'd consider the result from 100 iterations from Tulip great for testing purposes and would only run more iterations for creation of a production file.

NOTE: When I say Tulip did not reach optimality, I mean that it did not think it had reached the minimum objective function -- the sum of (absolute differences between the ratio of new weight to old weight, minus 1). But more important, it satisfied the constraints very early on, within its tolerances -- that is, constraints for wages by income range, etc., all were met (and FAR sooner than Cbc met them).

You might be tempted to run Cbc for fewer iterations but looking at the output log, its results seem far from acceptable even at 200,000 iterations. It is not until it gets close to 250,000 iterations that the solution looks really good. (Obviously this varies from year to year, but the general pattern remains.)

I am a newcomer to Julia, JuMP, and Tulip. There may well be ways to speed it up further, but it is pretty good out of the box.

Obviously I would not recommend it as a replacement for Cbc without further testing. Sometimes one solver finds a particular problem harder than another solver and it's always possible that it will not be robust enough. Still based on my checking so far, it looks like a much better solver for taxdata purposes than Cbc and I'd recommend testing and probably deploying it. I plan to use it in my use of taxdata.

@andersonfrailey
Copy link
Collaborator

Thanks for doing the research on this, @donboyd5. I'm also annoyed with how long it takes the solver to finish running. I'll test out dropping this in later today!

@andersonfrailey
Copy link
Collaborator

@donboyd5, when I try to test your suggested changes I get this error:

ERROR: LoadError: type Parameters has no field IPM_IterationsLimit

Did you run into this at any point when you were working with Tulip? I haven't been able to find a solution for it yet.

@andersonfrailey
Copy link
Collaborator

@donboyd5 can you also tell me what Julia version you're using? I've been having trouble getting things running and I think it might have to do with the Julia/package versions.

@donboyd5
Copy link
Author

@andersonfrailey, thanks for checking this out.

I have not run into the specific error you show.

As for versions, first I used Julia 1.4.1 and it worked fine; today I upgraded to 1.6.1 and both Tulip and Cbc work fine in stand-alone Julia programs. I have not yet gone back to make sure that everything works in the taxdata context but will do so. Also, I don't rule out that I made some sort of typo in what I posted above, but I don't think so. After I rerun cps or puf using Tulip, I'll post full working solver.jl code. But in the interim, I created a small entirely self-contained Julia-only example below. You should be able to copy it into a .jl file and run it, adjusting the solver = ... line to run either Tulip or Cbc. It should work for both - it does for me on Julia 1.6.1.

I'll make sure everything works in the taxdata context and after doing that I'll post the solver.jl code. But I think the example below should help determine where the problem lies.

## imports
using JuMP
using Tulip, Cbc

# uncomment one of the following lines to define which solver you want
solver = "Tulip"
# solver = "Cbc"

## Instantiate JuMP model
if solver== "Tulip"
    print("Using Tulip...\n")
    model = Model(Tulip.Optimizer)
    set_optimizer_attribute(model, "IPM_IterationsLimit", 500)
    set_optimizer_attribute(model, "OutputLevel", 1)  # 0=disable output
elseif solver == "Cbc"
    println("Using Cbc...\n")
    model = Model(Cbc.Optimizer)
    set_optimizer_attribute(model, "logLevel", 1)  # 0=disable output
else
    println("ERROR...")
end

## Create variables
@variable(model, x >= 0)
@variable(model, y >= 0)

## Add constraints
@constraint(model, row1, x - y >= -2)
@constraint(model, row2, 2*x - y <= 4)
@constraint(model, row3, x + 2*y <= 7)

## Set the objective
@objective(model, Min, -2*x - y)

## Solve the problem
println("solving...")
optimize!(model)

## Check termination status
println("\nTermination status: ", termination_status(model), "\n")

## Print results
println("Objective value: ", objective_value(model))
println("x: ", value(x))
println("y: ", value(y))

## what solver was used?
println("\nSolver: ", solver_name(model), "\n")

@donboyd5
Copy link
Author

donboyd5 commented May 20, 2021

@andersonfrailey the code below is a full drop-in replacement for solver.jl in the cps_stage2 folder.

On my computer, with Julia 1.6.1, it works without trouble. I am 99.9% sure same is true of Julia 1.4.1 but I no longer have that installed to check.

Depending on whether you set solver="Tulip" or solver="Cbc" it will use the corresponding solver. You can set Tulip iterations with the variable Tulip_max_iter (I have not figured out how to set max iterations for Cbc; left alone it needs about a quarter million iterations). I did try two other solvers (Clp and CSDP) but neither is as good as Tulip. I read about several other JuMP-available solvers but the ones I look at require Matlab or have a non-open-source license. I may look a the few other options over the next week or two but I am going to guess that Tulip will be as good as it gets when you are using a linear programming approach. Further speedups are possible with other approaches but its probably not worth the effort because the Tulip speed is pretty good and other approaches would require more-substantial code changes.

using NPZ, JuMP, Cbc, Tulip

function Solve_func(year, tol)

	println("\nSolving weights for $year ...\n")

	solver = "Tulip"  # Tulip, Cbc
	Tulip_max_iter = 100  # 100 default, 500 seems good enough

	println("Using solver: ", solver)
	if solver == "Tulip"
		model = Model(Tulip.Optimizer)
		set_optimizer_attribute(model, "OutputLevel", 1)  # 0=disable output (default), 1=show iterations
		set_optimizer_attribute(model, "IPM_IterationsLimit", Tulip_max_iter)  # default 100
	elseif solver == "Cbc"
		model = Model(Cbc.Optimizer)
		set_optimizer_attribute(model, "logLevel", 1)
		# I have not figured out option to limit iterations
	else
		println("ERROR! Solver must be Tulip or Cbc.")
	end

	array = npzread(string(year, "_input.npz"))

	A1 = array["A1"]
	A2 = array["A2"]
	b = array["b"]

	N = size(A1)[2]

	@variable(model, r[1:N] >= 0)
	@variable(model, s[1:N] >= 0)

	@objective(model, Min, sum(r[i] + s[i] for i in 1:N))

	# bound on top by tolerance
	@constraint(model, [i in 1:N], r[i] + s[i] <= tol)

	# Ax = b
	@constraint(model, [i in 1:length(b)], sum(A1[i,j] * r[j] + A2[i,j] * s[j]
		                          for j in 1:N) == b[i])


	optimize!(model)
	println("Termination status: ", termination_status(model))
	println("Objective: ", objective_value(model))
	println("\nSolver used was: ", solver_name(model), "\n")

	r_vec = value.(r)
	s_vec = value.(s)

	npzwrite(string(year, "_output.npz"), Dict("r" => r_vec, "s" => s_vec))

	println("\n")

end



year_list = [x for x in 2014:2030]
tol = 0.70

# Run solver function for all years and tolerances (in order)
for i in year_list
	Solve_func(i, tol)
end

@donboyd5
Copy link
Author

@andersonfrailey I think if you were to use Tulip and stop it before optimality -- which I would recommend given how close the 500-iteration and even 100-iteration runs come to optimality as Cbc defines it -- I think you would want to put in a check to be sure that all the constraints have been satisfied within a reasonable tolerance. That has always been the case in the dozens of solves I have looked at -- usually the constraints seem to be satisfied after a few tens of iterations -- but I'd definitely put in a check.

@andersonfrailey
Copy link
Collaborator

Thanks, @donboyd5! I was able to get things going with a bit of work. It was just an issue with the Project.toml and Manifest.toml files. Running now

@andersonfrailey
Copy link
Collaborator

I've been running the current solver for ~24 hours to create new PUF weights and it still isn't finished so I'd like to say that I'm 100% in favor of moving to a different solver. I don't remember it being this slow when we initially switched...

@donboyd5
Copy link
Author

Agree wholeheartedly.

If you look at old puf stage2 results in the repo (I don't know how old), done with Clp, you'll see that they regularly took about 30k iterations (see screenshot from the repo of results for 2012). By contrast, the current solver, Cbc (which I thought called Clp but I am not sure), regularly requires a quarter-million iterations.

image

While 30k was awful, 250k is simply unbearable. As I mentioned, the CPS took 14 hours on my reasonably fast machine.

I have since used Tulip some more and find that 500 iterations is sufficient even though it does not achieve optimality. Although tinkering might get us there, it hits diminishing returns far sooner than 500 iterations and 100 is perfect for testing (about 1.5 minutes per year). As noted, I'd put in a check to be sure the solution is "good enough" at 500 iterations. I can help with that if you want.

@donboyd5
Copy link
Author

donboyd5 commented May 24, 2021

@andersonfrailey,

I have a version that now solves a single puf year (2012 - not tested on other years) reaching optimality in about 25 seconds in 46 iterations, satisfies the constraints, and gets a better objective function value than the current taxdata gets after an hour with 250 thousand iterations (approx).

It would be easy to make a simple drop-in replacement for solver.jl that does this.

To accomplish this, I made the following 3 changes to solver.jl, while keeping the overall structure the same:

  1. replaced the solver Cbc with the solver Tulip, as discussed above, using options appropriate for Tulip (which are slightly different from those for Cbc)
  2. replaced an inefficient approach taken in solver.jl with a far more efficient approach. The code had been written so that the tolerances around each weight adjustment (typically ~+/- 0.40 -- the r and s values) were coded as constraints. This adds ~ 252k constraints to the problem. However, both Cbc and Tulip allow you to place bounds on the variables so that you can force each variable to be in the (0, 0.4] range (or whatever you want). Mathematically it is the same as the constraint approach, but the solver is designed to deal with this efficiently. I replaced the r, s constraints with bounds on the r, s variables. This cuts the time by about 50% versus the notes above.
  3. scaled the problem as follows:
    (a) multiplied each value in each of the 27 rows (one per constraint) in the A matrices by a row-specific constant. That constant is 1,000. divided by the sum of the absolute values in the row (roughly 252k values in a row - one value per taxpayer or nonfiler). Thus the sum of the absolute values of each row of my scaled A matrices is 1,000. The reason for this is to reduce the problem of accumulating roundoff errors, which is a particular concern when different constraints have very different magnitudes.
    (b) multiplied each constraint by its corresponding scale value in step "(a)" above -- corresponding to its relevant row in an A matrix.
    This is mathematically equivalent to the problem taxdata is trying to solve, but gives the problem much better numerical stability, giving you a better solution (even better than that of Cbc run to optimality), in fewer iterations (only 46 for 2012), in much less time (25 seconds). Scaling is very important in optimization problems.

Tail end of output shown below:

  • The first screenshot is the full proposed approach incorporating items 1-3. As you can see, the objective function value is 6,608.2708, it took 46 iterations and total time of 25.54 seconds, and Tulip considered the result optimal (within its tolerances).
  • The 2nd screenshot uses Cbc and implements the 2nd item above (but not 1 or 3), so it is faster than what is currently implemented in taxdata. The objective function value is 6,608.2714 (not meaningfully different from Tulip, albeit slightly worse). It took 112,279 iterations and 936 seconds, and Cbc considered it optimal. I think it would have taken about twice as many iterations and twice as much time without improvement 2 above (i.e., as it is currently coded in taxdata).

Happy to help you implement this if you want.

Don

image

image

@donboyd5
Copy link
Author

donboyd5 commented May 25, 2021

@andersonfrailey,:

Copied below is drop-in-replacement code for puf stage2 solver.jl. It implements all 3 changes noted above.

I don't think I left any residue from experiments in the file (three residue lines are commented out) so you should be able to simply replace puf stage2 solver.jl with this code and try it out.

I made minor changes to scaling relative to what I described above: the scaling factor for each constraint is now chosen so that, when multiplied elementwise by the "A" matrices, the sum of absolute values in the row is the number of records divided by 1000 (252.8k records / 1000, or about 252.8). (Each A matrix has 27 rows (1 per constraint) and 252.8k columns (1 per taxpayer/nonfiler)). The general idea is to keep numbers from getting too large and causing numerical difficulty. I'm sure this is not optimal scaling but it is a LOT better than no scaling. (I don't know a cookbook approach to scaling the problem optimally.)

It prints a short report after each year showing:

  • Quantiles of the ratio of calculated constraint values to the given constraints
  • Quantiles of the ratio of new weights to the initial weights

I ran it on all years of the puf:

  • Every year solved to optimality
  • It took <= 63 iterations in every year, which was less than the max iterations option of 100 that I had set (can be changed in code if needed)
  • In the most-difficult year (2030), it took 63 iterations (should be the same on your machine) and 36 seconds. Most years were substantially fewer iterations and substantially less time.
  • For all years, as implied by optimality, all constraints were met: the ratio of calculated constraints to given constraints was 1 (within very small numerical tolerances).
  • For all years, as also implied by optimality, the ratio of new weights to initial weights was within the desired range

I copy below the final part of the summary for 2012, in case you want to run it on a single year and see if you get the same results. You should get the same values for everything but runtime, I think. You can see that:

  • It took 45 iterations (you should get the same), and 26.7 seconds on my machine.
  • Result was optimal
  • With objective of 6608.2708. If you check, you'll see that this is ever so slightly better than what Cbc gets, without scaling, for 2012.
  • Ratios of calculated to intended targets are all infinitessimally different from 1
  • Ratios of new weights to initial weights range from (rounded) 0.60 to 1.40. This is consistent with the tolerance that you set for 2012, which is +/- 0.40.

44 +6.6082216e+03 +6.6082212e+03 7.10e-08 7.27e-06 6.65e-05 2.8e-09 26.17
45 +6.6082708e+03 +6.6082708e+03 7.71e-10 7.87e-08 7.15e-07 3.0e-11 26.71
Solver exited with status Trm_Optimal
Termination status: OPTIMAL
Objective = 6608.2708

Quantiles used below: (0, 0.1, 0.25, 0.5, 0.75, 0.9, 1)

Quantiles of ratio of calculated targets to intended targets:
(0.9999999998108967, 0.9999999998113364, 0.9999999998119191, 0.9999999998128851, 0.9999999998130706, 0.9999999998144502, 0.9999999998262367)

Quantiles of ratio of new weight to initial weight:
(0.5999999996933609, 0.9999999931059017, 0.9999999985655862, 0.9999999997713837, 1.000000000066358, 1.0000000009277101, 1.4000000005639284)

using JuMP, NPZ, Tulip
using Printf
using Statistics
using LinearAlgebra

function Solve_func(year, tol)

	println("\nSolving weights for $year ...\n\n")

	array = npzread(string(year, "_input.npz"))

	# ddir = "/home/donboyd/Documents/python_projects/taxdata/puf_stage2/"
	# array = npzread(string(ddir, year, "_input.npz"))

	A1 = array["A1"]
	A2 = array["A2"]
	b = array["b"]

	N = size(A1)[2]

        # scaling: determine a scaling vector with one value per constraint
	#  - the goal is to keep coefficients reasonably near 1.0
	#  - multiply each row of A1 and A2 by its specific scaling constant
	#  - multiply each element of the b target vector by its scaling constant
	#  - current approach: choose scale factors so that the sum of absolute values in each row of
	#    A1 and of A2 will equal the total number of records / 1000; maybe we can improve on this

	scale = (N / 1000.) ./ sum(abs.(A1), dims=2)

        A1s = scale .* A1
	A2s = scale .* A2
	bs = scale .* b

	model = Model(Tulip.Optimizer)
	set_optimizer_attribute(model, "OutputLevel", 1)  # 0=disable output (default), 1=show iterations
	set_optimizer_attribute(model, "IPM_IterationsLimit", 100)  # default 100 seems to be enough

	# r and s must each fall between 0 and the tolerance
	@variable(model, 0 <= r[1:N] <= tol)
	@variable(model, 0 <= s[1:N] <= tol)

	@objective(model, Min, sum(r[i] + s[i] for i in 1:N))

	# Ax = b  - use the scaled matrices and vector
	@constraint(model, [i in 1:length(bs)], sum(A1s[i,j] * r[j] + A2s[i,j] * s[j]
		                          for j in 1:N) == bs[i])

	optimize!(model)

	println("Termination status: ", termination_status(model))
	@printf "Objective = %.4f\n" objective_value(model)

	r_vec = value.(r)
	s_vec = value.(s)

	npzwrite(string(year, "_output.npz"), Dict("r" => r_vec, "s" => s_vec))

	println("\n")

	# quick checks on results

	# Did we satisfy constraints?
	rs = r_vec - s_vec
	b_calc = sum(rs' .* A1, dims=2)
	check = vec(b_calc) ./ b

	q = (0, .1, .25, .5, .75, .9, 1)
	println("Quantiles used below: ", q)

	println("\nQuantiles of ratio of calculated targets to intended targets: ")
	println(quantile!(check, q))

	# Are the ratios of new weights to old weights in bounds (within tolerances)?
	x = 1.0 .+ r_vec - s_vec  # note the .+
	println("\nQuantiles of ratio of new weight to initial weight: ")
	println(quantile!(x, q))

end


year_list = [x for x in 2012:2030]
tol_list = [0.40, 0.38, 0.35, 0.33, 0.30,
 			0.45, 0.45, 0.45, 0.45, 0.45,
			0.45, 0.45, 0.45, 0.45,	0.45,
			0.45, 0.45, 0.45, 0.45]

# Run solver function for all years and tolerances (in order)
for i in zip(year_list, tol_list)
	Solve_func(i[1], i[2])
end

# Solve_func(2013, 0.38)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants