Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implementation of bounded LBFGS algorithm #1290

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mikeevmm
Copy link

@mikeevmm mikeevmm commented Apr 4, 2021

This commit implements the bounded version of the Limited-memory Broyden–Fletcher–Goldfarb–Shanno optimization algorithm, per [1], in lbfgsb.py. It is lacking unit tests (which have been run outside of the committed changes), because I don't understand how to properly implement them alongside, e.g., the LBFGS unit tests. The following unit tests were ran:

  • Converges to a 60-dimensional quadratic bowl
  • Converges to the correct minimum of the Rosenbrock surface given sufficiently far-away bounds
  • Converges to [5., 5.] in the Rosenbrock surface, when given bounds [(5, 20), (-10, 5)], which matches the original Fortran implementation.

I now need help making sure the code is up to the style guide, per issue #1273 .

[1] Richard H. Byrd, Peihuang Lu, Jorge Nocedal, & Ciyou Zhu (1995).
A Limited Memory Algorithm for Bound Constrained Optimization
SIAM Journal on Scientific Computing, 16(5), 1190–1208.
https://doi.org/10.1137/0916069

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

1 similar comment
@google-cla
Copy link

google-cla bot commented Apr 4, 2021

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added the cla: no Declares that the user has not signed CLA label Apr 4, 2021
@mikeevmm
Copy link
Author

mikeevmm commented Apr 4, 2021

@googlebot I signed it!

@google-cla
Copy link

google-cla bot commented Apr 4, 2021

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

1 similar comment
@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@google-cla google-cla bot added cla: yes Declares that the user has signed CLA and removed cla: no Declares that the user has not signed CLA labels Apr 4, 2021
@davmre
Copy link
Contributor

davmre commented Apr 8, 2021

Hi Miguel, thanks very much for this PR. This is a really nice contribution, and it's great to see such thorough and well-documented code. I made a quick pass with a few minor style contents, but overall this looks to be in quite good shape.

I do have a high-level concern around the use of ragged Tensors. As you note, they can be pretty finicky, they make it tricky to work with batches, and they raise compatibility issues with, e.g., our JAX backend. It's not a dealbreaker, but if you're willing, I'd like to try to help think through whether another approach could work. I haven't fully worked through the code yet---to help me understand, can you explain what the ragged Tensors are being used for, and why they need to be ragged?

Before checking this in, we'll need to add some tests, along with corresponding entries in the BUILD file (you can use the entries for lbfgs and lbfgs_test as a model). I don't think the tests need to interact with the LBFGS tests, necessarily (although feel free to copy any that seem relevant), but at minimum we need a file lbfgsb_test.py that follows the pattern of lfbgs_test and other tests:

  1. It should define a test class that inherits from tensorflow_probability.python.internal.test_util.TestCase
  2. This class should be decorated with @test_util.test_all_tf_execution_regimes (i.e., runs tests in graph and eager modes)
  3. The bottom of the file should have the the magic lines if __name__ == '__main__': tf.test.main().
  4. Unit tests are implemented as methods of the test class whose names start with test.

It sounds like you already have some tests, so if you could adapt them to add to this PR that'd be a great start. Is there anything in particular you're not sure about? I'm happy to try to help figure out the best way to set things up.

  • Dave

w_b),
state.ddf)
# NOTE: See lbfgsb.f (l. 1649)
# TODO: How to get machine epsilon?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dtype_util.eps(ddf.dtype) should do it

# The number of correction pairs that have been collected so far.
#num_elements = ps.minimum(
# state.num_iterations, # TODO(b/162733947): Change loop state -> closure.
# ps.shape(state.position_deltas)[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the commented code can be removed here

grad_tolerance, f_relative_tolerance, x_tolerance,
stopping_condition, max_iterations, free_mask, cauchy_point):
"""Performs the line search in given direction, backtracking in direction to the cauchy point,
and clamping actively contrained variables to the cauchy point."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: our linter requires docstrings to be either in the specific short form

"""One-line docstring of at most 80 chars, ending with punctuation."""

or the long form, including descriptions of args:

"""One-line summary of at most 80 chars, ending with punctuation.

One or more paragraphs of additional details.

Args:
arg1: ...
arg2: ...
Returns:
return_value:
"""

Assuming that you don't want to write long docstrings for these internal methods, can you please try to fit the one-line docstring format? E.g., here you could maybe just say

"""Performs the line search in the given direction."""

and move the points about backtracking and clamping to comments for the relevant code.

(also applies elsewhere to other methods)

state.breakpoint_min_old)

# Find b
breakpoint_min_idx, breakpoint_min = \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is in the TFP style guide specifically, because it's more generally covered by Google Python style, but we prefer parentheses over backslashes for writing multiline expressions, e.g.:

breakpoint_min_idx, breakpoint_min = (
      _cauchy_get_breakpoint_min(
        state.breakpoints,
        free_vars_idx))

though in this particular case you can probably just put the call on the same line.

(likely applies elsewhere)

@mikeevmm
Copy link
Author

mikeevmm commented Apr 14, 2021

Hello,

Thank you for the kind words. I am again a bit swamped, so I will return to this when I can, but I have read all of your comments and have no objections.

Re. ragged tensors: they were more in use (and a greater source of problems) previously, as I was keeping track of free_vars_idx and other free-variable-related tensors with ragged arrays. I eventually replaced those with tensors with flag values, so it should no longer be a problem.

Right now, as far as I can tell and remember, ragged tensors are only in use in lines 576-590, to transpose differently between batches, and since these are immediately to_tensored, I don't expect it to be a problem (but really I wouldn't know).

I have also found that the current approach evaluates the cost function far far more often than the Fortran implementation (about 10 times as much in my use case). Depending on how costly this is, it can be a real bottleneck (this turned out to be the case for me). I am working on implementing the direct primal method to circumvent the double use of hanger_zhang_line_search.

@srvasude
Copy link
Member

Hi!
I was curious if you had time to investigate addressing / implementing these issues for LBFGS-B. We have some internal use cases for it, so I think we would be excited to have it! I am also happy to help / use this PR as a foundation for checking in an implementation for LBFGS-B.

Thank you!
Srinivas

@mikeevmm
Copy link
Author

mikeevmm commented Jun 25, 2021

Hello @srvasude,

I've uploaded the more recent changes I've made to the code.

It remains to write unit tests, and modifying the code to accept more batch dimensions. The modifications of commit 41b2267 should help a lot with this latter point, because the queue no longer has the batch dimensions "in the middle", and are instead leading. This means a lot of the stuff that's using einsum and ... should already work out of the box with many batch dimensions. Regardless, the code still needs some adapting for that to work.

Commit 64d30a4 is meant to be a starting point for writing the unit tests. However, it would maybe be best to implement multiple batching dimensions before writing the tests.

Miguel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Declares that the user has signed CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants