Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to gradients #417

Open
wants to merge 24 commits into
base: master
Choose a base branch
from
Open

Changes to gradients #417

wants to merge 24 commits into from

Conversation

marcoct
Copy link
Collaborator

@marcoct marcoct commented May 18, 2021

Note: This PR includes breaking API changes and documentation is also not yet complete for it. It is not ready for review.

This PR is now ready for review.

  • Introduces "parameter contexts" and "parameter stores", so that different traces of the same generative function can use different sets of parameters. The built-in parameter store type is a JuliaParameterStore, but there can be other types of parameter stores, including those that contain the states of TensorFlow and PyTorch parameters. There is a default parameter context, which references a default JuliaParameterStore. The GFI methods simulate, generate, propose, and assess take an optional parameter context argument.

  • Makes accumulating gradients for JuliaParameterStores a thread-safe operation, so that multiple traces can concurrently accumulate gradients for the same parameters in the same parameter store. This is useful for stochastic gradient training, where gradients for each a each trace in a mini-batch can be computed concurrently.

  • Reduces amount of re-allocation of parameter-sized arrays during gradient accumulation and parameter updates by using in-place updates of gradient accumulators in various places.

  • It is now possible to construct optimizers that automatically apply to all parameters used by a generative function including in nested calls to other generative functions. This uses a new get_parameters(gen_fn, parameter_context) function, which is implemented for SML by statically walking the call tree, and for DML by user-provided list of all parameters via a new function register_parameters!.

  • Many other changes to API for updates.

  • Some code quality improvements.

  • Adds optional multithreaded flags to several inference functions, including importance_sampling, black_box_vi!, and black_box_vimco!.

@marcoct marcoct marked this pull request as draft May 18, 2021 04:14
@marcoct marcoct force-pushed the 20210512-marcoct-gradopts branch from b173841 to f832a15 Compare May 18, 2021 16:52
@marcoct marcoct force-pushed the 20210512-marcoct-gradopts branch from f832a15 to c045614 Compare May 18, 2021 17:48
@marcoct marcoct force-pushed the 20210512-marcoct-gradopts branch from 4030288 to 64ee5e0 Compare May 18, 2021 22:33
@marcoct marcoct marked this pull request as ready for review May 19, 2021 20:54
@marcoct
Copy link
Collaborator Author

marcoct commented May 19, 2021

Some of the changes in this PR (in particular the added support for multi-threaded gradient accumulation, and use of more in-place array operations) were motivated by this example, which applies reweighted wake sleep to learn a generative model of binarized MNIST digits.

The other changes aim to reduce technical debt and improve extensibility.

@marcoct marcoct requested a review from alex-lew May 19, 2021 21:03
Copy link
Contributor

@alex-lew alex-lew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcoct This is awesome! It'll be really great to have multithreaded SGD. LGTM, for the next breaking-changes release.

Some questions:

  1. It seems that dynamic DSL traces store both a parameter store and a parameter context. Why? Isn't the parameter store just a particular field of the parameter context? Is this just to avoid looking it up each time you want to use it?

  2. What do the keys of the parameter context represent, intuitively? It seems like maybe the answer is "parameter groups" -- a GF expects to be able to find a store for each parameter group it uses, and for each parameter group, it expects the store to support certain functionality. Is that right?

  3. Is there some way of factoring an optimizer's logic (the OptimConf) from its implementation for each parameter store? I suppose an optimizer could be parametric in the type of parameter store -- but this would require some general interface for parameter stores, which it seems like is not the goal here.

  4. If a GF uses at some point parameters that are not stored in the default Julia parameter store, does the user always need to pass in a parameter context providing them? Or is there some way the default parameter context can be extended? It's currently unclear to me exactly how we'd adapt Torch generative functions to this interface. One option is to have a dummy struct TorchParameterStore that actually stores nothing, and just looks up (gen_fn, id) inside the gen_fn object itself. Or is it explicitly a goal of this PR that generative functions should never themselves store parameters? If so, an alternative is to figure out a different way of constructing the Torch models so that the parameters really do live in separate PyObjects, and can be swapped out at will; presumably there is a Torch way to do this, but it may make it less natural to construct Torch models.

  5. Do all parameter stores need to support multithreading, or do we want a way for a parameter store to mark itself as not supporting multithreading?

  6. For now, I think this is fine, but I wonder about the design choice to have only one parameter store for each key? What if I want the top-level GF to use one Julia parameter store, but further down, a different one? (Not sure why I'd want this, to be fair. But I can see it being one pattern of implementation for Torch / TF GF's -- where each GF uses a different parameter store.) More generally, a parameter context is weird in that it is one of the few Gen data structures that is not hierarchically organized according to a GF's call tree. Is this intentional long-term, or a short-term simplification?

  7. The name register_parameters! makes me think that calling it multiple times will register all the parameters that I pass in. But in fact, it will register only the most recent set of parameters. Maybe we can either change the behavior or the name? ("set_parameters!"?)

  8. I wonder if there is some way to avoid users having to register all parameters that a generative function might use? Can we do something akin to AllSelection? (I guess the user has to initialize all the parameters anyway... so maybe this doesn't help much.)

@marcoct
Copy link
Collaborator Author

marcoct commented Jul 14, 2021

It seems that dynamic DSL traces store both a parameter store and a parameter context. Why? Isn't the parameter store just a particular field of the parameter context? Is this just to avoid looking it up each time you want to use it?

Yes.

@marcoct
Copy link
Collaborator Author

marcoct commented Jul 14, 2021

What do the keys of the parameter context represent, intuitively? It seems like maybe the answer is "parameter groups" -- a GF expects to be able to find a store for each parameter group it uses, and for each parameter group, it expects the store to support certain functionality. Is that right?

The original motivation is that a given generative function may use parameters that are managed by different runtime systems (e.g. Julia in-memory, PyTorch, etc.), and that each runtime system exposes a parameter store. I think it's possible the same functionality can be potentially used for other purposes, but I'm not sure.

@marcoct
Copy link
Collaborator Author

marcoct commented Jul 14, 2021

Is there some way of factoring an optimizer's logic (the OptimConf) from its implementation for each parameter store? I suppose an optimizer could be parametric in the type of parameter store -- but this would require some general interface for parameter stores, which it seems like is not the goal here.

@alex-lew Do you mean, for each optimizer configuration, storing some symbolic expression representing the mathematical update and having each parameter store generate its code from that expression?

@marcoct
Copy link
Collaborator Author

marcoct commented Jul 14, 2021

Do all parameter stores need to support multithreading, or do we want a way for a parameter store to mark itself as not supporting multithreading?

No, not all parameter stores need to support multithreading. We could define a supports_multithreading method for each parameter store type.

@marcoct
Copy link
Collaborator Author

marcoct commented Jul 14, 2021

If a GF uses at some point parameters that are not stored in the default Julia parameter store, does the user always need to pass in a parameter context providing them? Or is there some way the default parameter context can be extended?

This new interface allows for a generalization of the previous behavior. Previously, each generative function was tied to a single parameter store. It is possible to reproduce the previous behavior by creating a separate parameter store for each generative function (although I don't see why one would want to do this). Note that this new more flexible interface lets you have two copies of the generative function that use different parameters.

It's currently unclear to me exactly how we'd adapt Torch generative functions to this interface. One option is to have a dummy struct TorchParameterStore that actually stores nothing, and just looks up (gen_fn, id) inside the gen_fn object itself. Or is it explicitly a goal of this PR that generative functions should never themselves store parameters? If so, an alternative is to figure out a different way of constructing the Torch models so that the parameters really do live in separate PyObjects, and can be swapped out at will; presumably there is a Torch way to do this, but it may make it less natural to construct Torch models.

Can we start with there always being a single PyTorchParameterStore instance (automatically constructed by PyTorch) for the PyTorch runtime that is running in the process, and have it store references to the parameters of the form (gen_fn, id)?

Or is it explicitly a goal of this PR that generative functions should never themselves store parameters?

The goals of this aspect of the PR were to allow for the possibility of there being multiple versions of a single parameter value, for the possibility of two generative functions to directly use the same parameter, and to make it easier to add support for checkpoint and load functionality for parameters. It seems like if a parameter store chooses to have each generative function store its own copy of the parameters then those goals might be hard to achieve, but it's okay if GenPyTorch' doesn't support the same flexibility as JuliaParameterStore.

@marcoct
Copy link
Collaborator Author

marcoct commented Jul 14, 2021

The name register_parameters! makes me think that calling it multiple times will register all the parameters that I pass in. But in fact, it will register only the most recent set of parameters. Maybe we can either change the behavior or the name? ("set_parameters!"?)

@alex-lew Right, thanks. I think it's reasonable for it to either (i) error if you call it more than once, or (ii) only ever add parameters and not reset the dictionary. I think set_parameters! sounds like it is setting their values.

@marcoct
Copy link
Collaborator Author

marcoct commented Jul 14, 2021

I wonder if there is some way to avoid users having to register all parameters that a generative function might use? Can we do something akin to AllSelection? (I guess the user has to initialize all the parameters anyway... so maybe this doesn't help much.)

Right. Users already had to initialize all the parameters, so this PR doesn't introduce significant new burden on users. But we could avoid making users register parameters at all using a static analysis of the DML function.

@marcoct
Copy link
Collaborator Author

marcoct commented Jul 14, 2021

@alex-lew To be clear, I think we should implement the minimal changes to GenPyTorch to make it compatible with this PR, or maybe change this PR if that's not possible for some reason, before merging this PR master.

@alex-lew
Copy link
Contributor

Is there some way of factoring an optimizer's logic (the OptimConf) from its implementation for each parameter store? I suppose an optimizer could be parametric in the type of parameter store -- but this would require some general interface for parameter stores, which it seems like is not the goal here.

@alex-lew Do you mean, for each optimizer configuration, storing some symbolic expression representing the mathematical update and having each parameter store generate its code from that expression?

That could be one way of doing it -- i.e., when implementing a parameter store, you can choose to look up a list of optimizer expressions and implement the necessary methods (perhaps in a loop, using @eval).

@alex-lew
Copy link
Contributor

If a GF uses at some point parameters that are not stored in the default Julia parameter store, does the user always need to pass in a parameter context providing them? Or is there some way the default parameter context can be extended?

This new interface allows for a generalization of the previous behavior. Previously, each generative function was tied to a single parameter store. It is possible to reproduce the previous behavior by creating a separate parameter store for each generative function (although I don't see why one would want to do this). Note that this new more flexible interface lets you have two copies of the generative function that use different parameters.

I am asking specifically about the fact that

  • The default parameter context has only one key (for a Julia parameter store)
  • Some generative functions (e.g. PyTorch generative functions) do not use parameters from Julia parameter stores

So, if called without a custom parameter context, where should a PyTorch generative function look up its parameters?

Is the idea that a package like GenPyTorch should mutate the default parameter context, extending it with a new key? Or is it an error to call a GenPyTorch GF without providing a custom parameter context?

It's currently unclear to me exactly how we'd adapt Torch generative functions to this interface. One option is to have a dummy struct TorchParameterStore that actually stores nothing, and just looks up (gen_fn, id) inside the gen_fn object itself. Or is it explicitly a goal of this PR that generative functions should never themselves store parameters? If so, an alternative is to figure out a different way of constructing the Torch models so that the parameters really do live in separate PyObjects, and can be swapped out at will; presumably there is a Torch way to do this, but it may make it less natural to construct Torch models.

Can we start with there always being a single PyTorchParameterStore instance (automatically constructed by PyTorch) for the PyTorch runtime that is running in the process, and have it store references to the parameters of the form (gen_fn, id)?

This seems plausible, but I would need to think about it. Currently, the user constructs a PyTorch model object, which stores some parameters, and this model object is stored within the generative function. Off the top of my head, it's unclear how to run the model object with 'different parameter objects' accessed inside the PyTorchParameterStore. i.e., if the PyTorchParameterStore is not just a dummy struct, but actually stores references to parameters, I am not sure how to make the execution of the PyTorch GF actually depend on the references that are stored there. Presumably this is possible -- I'd just need to do some reading of the PyTorch docs.

@alex-lew
Copy link
Contributor

@alex-lew To be clear, I think we should implement the minimal changes to GenPyTorch to make it compatible with this PR, or maybe change this PR if that's not possible for some reason, before merging this PR master.

Got it! I'm finishing some conference reviews this week and may not have time to do this until after vacation, maybe first week of August? Sorry for the delay!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Gen Summer 2021 Hackathon
Waiting for Review
Development

Successfully merging this pull request may close these issues.

None yet

2 participants