Changes to gradients #417

marcoct · 2021-05-18T04:11:42Z

~~Note: This PR includes breaking API changes and documentation is also not yet complete for it. It is not ready for review.~~

This PR is now ready for review.

Introduces "parameter contexts" and "parameter stores", so that different traces of the same generative function can use different sets of parameters. The built-in parameter store type is a JuliaParameterStore, but there can be other types of parameter stores, including those that contain the states of TensorFlow and PyTorch parameters. There is a default parameter context, which references a default JuliaParameterStore. The GFI methods simulate, generate, propose, and assess take an optional parameter context argument.
Makes accumulating gradients for JuliaParameterStores a thread-safe operation, so that multiple traces can concurrently accumulate gradients for the same parameters in the same parameter store. This is useful for stochastic gradient training, where gradients for each a each trace in a mini-batch can be computed concurrently.
Reduces amount of re-allocation of parameter-sized arrays during gradient accumulation and parameter updates by using in-place updates of gradient accumulators in various places.
It is now possible to construct optimizers that automatically apply to all parameters used by a generative function including in nested calls to other generative functions. This uses a new get_parameters(gen_fn, parameter_context) function, which is implemented for SML by statically walking the call tree, and for DML by user-provided list of all parameters via a new function register_parameters!.
Many other changes to API for updates.
Some code quality improvements.
Adds optional multithreaded flags to several inference functions, including importance_sampling, black_box_vi!, and black_box_vimco!.

…ulation

marcoct · 2021-05-19T20:59:23Z

Some of the changes in this PR (in particular the added support for multi-threaded gradient accumulation, and use of more in-place array operations) were motivated by this example, which applies reweighted wake sleep to learn a generative model of binarized MNIST digits.

The other changes aim to reduce technical debt and improve extensibility.

alex-lew

@marcoct This is awesome! It'll be really great to have multithreaded SGD. LGTM, for the next breaking-changes release.

Some questions:

It seems that dynamic DSL traces store both a parameter store and a parameter context. Why? Isn't the parameter store just a particular field of the parameter context? Is this just to avoid looking it up each time you want to use it?
What do the keys of the parameter context represent, intuitively? It seems like maybe the answer is "parameter groups" -- a GF expects to be able to find a store for each parameter group it uses, and for each parameter group, it expects the store to support certain functionality. Is that right?
Is there some way of factoring an optimizer's logic (the OptimConf) from its implementation for each parameter store? I suppose an optimizer could be parametric in the type of parameter store -- but this would require some general interface for parameter stores, which it seems like is not the goal here.
If a GF uses at some point parameters that are not stored in the default Julia parameter store, does the user always need to pass in a parameter context providing them? Or is there some way the default parameter context can be extended? It's currently unclear to me exactly how we'd adapt Torch generative functions to this interface. One option is to have a dummy struct TorchParameterStore that actually stores nothing, and just looks up (gen_fn, id) inside the gen_fn object itself. Or is it explicitly a goal of this PR that generative functions should never themselves store parameters? If so, an alternative is to figure out a different way of constructing the Torch models so that the parameters really do live in separate PyObjects, and can be swapped out at will; presumably there is a Torch way to do this, but it may make it less natural to construct Torch models.
Do all parameter stores need to support multithreading, or do we want a way for a parameter store to mark itself as not supporting multithreading?
For now, I think this is fine, but I wonder about the design choice to have only one parameter store for each key? What if I want the top-level GF to use one Julia parameter store, but further down, a different one? (Not sure why I'd want this, to be fair. But I can see it being one pattern of implementation for Torch / TF GF's -- where each GF uses a different parameter store.) More generally, a parameter context is weird in that it is one of the few Gen data structures that is not hierarchically organized according to a GF's call tree. Is this intentional long-term, or a short-term simplification?
The name register_parameters! makes me think that calling it multiple times will register all the parameters that I pass in. But in fact, it will register only the most recent set of parameters. Maybe we can either change the behavior or the name? ("set_parameters!"?)
I wonder if there is some way to avoid users having to register all parameters that a generative function might use? Can we do something akin to AllSelection? (I guess the user has to initialize all the parameters anyway... so maybe this doesn't help much.)

marcoct · 2021-07-14T15:51:40Z

It seems that dynamic DSL traces store both a parameter store and a parameter context. Why? Isn't the parameter store just a particular field of the parameter context? Is this just to avoid looking it up each time you want to use it?

Yes.

marcoct · 2021-07-14T15:55:09Z

What do the keys of the parameter context represent, intuitively? It seems like maybe the answer is "parameter groups" -- a GF expects to be able to find a store for each parameter group it uses, and for each parameter group, it expects the store to support certain functionality. Is that right?

The original motivation is that a given generative function may use parameters that are managed by different runtime systems (e.g. Julia in-memory, PyTorch, etc.), and that each runtime system exposes a parameter store. I think it's possible the same functionality can be potentially used for other purposes, but I'm not sure.

marcoct · 2021-07-14T16:17:17Z

Is there some way of factoring an optimizer's logic (the OptimConf) from its implementation for each parameter store? I suppose an optimizer could be parametric in the type of parameter store -- but this would require some general interface for parameter stores, which it seems like is not the goal here.

@alex-lew Do you mean, for each optimizer configuration, storing some symbolic expression representing the mathematical update and having each parameter store generate its code from that expression?

marcoct · 2021-07-14T16:19:42Z

Do all parameter stores need to support multithreading, or do we want a way for a parameter store to mark itself as not supporting multithreading?

No, not all parameter stores need to support multithreading. We could define a supports_multithreading method for each parameter store type.

marcoct · 2021-07-14T16:47:46Z

If a GF uses at some point parameters that are not stored in the default Julia parameter store, does the user always need to pass in a parameter context providing them? Or is there some way the default parameter context can be extended?

This new interface allows for a generalization of the previous behavior. Previously, each generative function was tied to a single parameter store. It is possible to reproduce the previous behavior by creating a separate parameter store for each generative function (although I don't see why one would want to do this). Note that this new more flexible interface lets you have two copies of the generative function that use different parameters.

It's currently unclear to me exactly how we'd adapt Torch generative functions to this interface. One option is to have a dummy struct TorchParameterStore that actually stores nothing, and just looks up (gen_fn, id) inside the gen_fn object itself. Or is it explicitly a goal of this PR that generative functions should never themselves store parameters? If so, an alternative is to figure out a different way of constructing the Torch models so that the parameters really do live in separate PyObjects, and can be swapped out at will; presumably there is a Torch way to do this, but it may make it less natural to construct Torch models.

Can we start with there always being a single PyTorchParameterStore instance (automatically constructed by PyTorch) for the PyTorch runtime that is running in the process, and have it store references to the parameters of the form (gen_fn, id)?

Or is it explicitly a goal of this PR that generative functions should never themselves store parameters?

The goals of this aspect of the PR were to allow for the possibility of there being multiple versions of a single parameter value, for the possibility of two generative functions to directly use the same parameter, and to make it easier to add support for checkpoint and load functionality for parameters. It seems like if a parameter store chooses to have each generative function store its own copy of the parameters then those goals might be hard to achieve, but it's okay if GenPyTorch' doesn't support the same flexibility as JuliaParameterStore.

marcoct · 2021-07-14T16:53:11Z

The name register_parameters! makes me think that calling it multiple times will register all the parameters that I pass in. But in fact, it will register only the most recent set of parameters. Maybe we can either change the behavior or the name? ("set_parameters!"?)

@alex-lew Right, thanks. I think it's reasonable for it to either (i) error if you call it more than once, or (ii) only ever add parameters and not reset the dictionary. I think set_parameters! sounds like it is setting their values.

marcoct · 2021-07-14T16:56:32Z

I wonder if there is some way to avoid users having to register all parameters that a generative function might use? Can we do something akin to AllSelection? (I guess the user has to initialize all the parameters anyway... so maybe this doesn't help much.)

Right. Users already had to initialize all the parameters, so this PR doesn't introduce significant new burden on users. But we could avoid making users register parameters at all using a static analysis of the DML function.

marcoct · 2021-07-14T22:20:20Z

@alex-lew To be clear, I think we should implement the minimal changes to GenPyTorch to make it compatible with this PR, or maybe change this PR if that's not possible for some reason, before merging this PR master.

alex-lew · 2021-07-15T16:24:09Z

Is there some way of factoring an optimizer's logic (the OptimConf) from its implementation for each parameter store? I suppose an optimizer could be parametric in the type of parameter store -- but this would require some general interface for parameter stores, which it seems like is not the goal here.

@alex-lew Do you mean, for each optimizer configuration, storing some symbolic expression representing the mathematical update and having each parameter store generate its code from that expression?

That could be one way of doing it -- i.e., when implementing a parameter store, you can choose to look up a list of optimizer expressions and implement the necessary methods (perhaps in a loop, using @eval).

alex-lew · 2021-07-15T16:31:24Z

If a GF uses at some point parameters that are not stored in the default Julia parameter store, does the user always need to pass in a parameter context providing them? Or is there some way the default parameter context can be extended?

This new interface allows for a generalization of the previous behavior. Previously, each generative function was tied to a single parameter store. It is possible to reproduce the previous behavior by creating a separate parameter store for each generative function (although I don't see why one would want to do this). Note that this new more flexible interface lets you have two copies of the generative function that use different parameters.

I am asking specifically about the fact that

The default parameter context has only one key (for a Julia parameter store)
Some generative functions (e.g. PyTorch generative functions) do not use parameters from Julia parameter stores

So, if called without a custom parameter context, where should a PyTorch generative function look up its parameters?

Is the idea that a package like GenPyTorch should mutate the default parameter context, extending it with a new key? Or is it an error to call a GenPyTorch GF without providing a custom parameter context?

It's currently unclear to me exactly how we'd adapt Torch generative functions to this interface. One option is to have a dummy struct TorchParameterStore that actually stores nothing, and just looks up (gen_fn, id) inside the gen_fn object itself. Or is it explicitly a goal of this PR that generative functions should never themselves store parameters? If so, an alternative is to figure out a different way of constructing the Torch models so that the parameters really do live in separate PyObjects, and can be swapped out at will; presumably there is a Torch way to do this, but it may make it less natural to construct Torch models.

Can we start with there always being a single PyTorchParameterStore instance (automatically constructed by PyTorch) for the PyTorch runtime that is running in the process, and have it store references to the parameters of the form (gen_fn, id)?

This seems plausible, but I would need to think about it. Currently, the user constructs a PyTorch model object, which stores some parameters, and this model object is stored within the generative function. Off the top of my head, it's unclear how to run the model object with 'different parameter objects' accessed inside the PyTorchParameterStore. i.e., if the PyTorchParameterStore is not just a dummy struct, but actually stores references to parameters, I am not sure how to make the execution of the PyTorch GF actually depend on the references that are stored there. Presumably this is possible -- I'd just need to do some reading of the PyTorch docs.

alex-lew · 2021-07-15T16:32:33Z

@alex-lew To be clear, I think we should implement the minimal changes to GenPyTorch to make it compatible with this PR, or maybe change this PR if that's not possible for some reason, before merging this PR master.

Got it! I'm finishing some conference reviews this week and may not have time to do this until after vacation, maybe first week of August? Sorry for the delay!

marcoct added 9 commits May 12, 2021 16:12

checkpoint on experimental performance improvements to gradient accum…

1cc4767

…ulation

fix newly introduced bug

e471f46

checkpoint on gradients refactor

a5473f4

checkpoint

fcf1eb2

checkpoint on changes to DML

6e85f3f

checkpoint on SML changes

4b5d804

checkpoint on getting tests to pass again

ad05f34

fixes

d6f636c

tests passing

3fc5a0e

marcoct marked this pull request as draft May 18, 2021 04:14

marcoct force-pushed the 20210512-marcoct-gradopts branch from b173841 to f832a15 Compare May 18, 2021 16:52

finished pass through documentation

c045614

marcoct force-pushed the 20210512-marcoct-gradopts branch from f832a15 to c045614 Compare May 18, 2021 17:48

marcoct added 5 commits May 18, 2021 13:49

re-enable optional args test

a4952de

add multithreading to importance sampling

65a41a9

add unit tests for parameter optimization primitives

3cc2d27

add multithreading to travis build

22b9c98

add multi-threading to VI and importance sampling with tests

64ee5e0

marcoct force-pushed the 20210512-marcoct-gradopts branch from 4030288 to 64ee5e0 Compare May 18, 2021 22:33

marcoct added 7 commits May 19, 2021 01:21

fixes to docs

6ef392d

fix get_parameters

ca03f61

fix init_parameters! bug on resize params

412c1b0

add default get_parameters

c59bd76

fix init_optimizer ambuigity

b0909ef

fix ambiguity

4c9a8a1

add missing file for static_ir gradients tests

2fba354

marcoct mentioned this pull request May 19, 2021

ReverseDiff tracks through length #418

Open

marcoct marked this pull request as ready for review May 19, 2021 20:54

marcoct requested a review from alex-lew May 19, 2021 21:03

marcoct mentioned this pull request May 19, 2021

Reweighted wake sleep deep generative model example probcomp/GenExamples.jl#3

Open

marcoct added this to Waiting for Review in Gen Summer 2021 Hackathon Jul 13, 2021

marcoct mentioned this pull request Jul 13, 2021

accum_param_gradients! does not support scale_factor for static functions #387

Open

alex-lew approved these changes Jul 14, 2021

View reviewed changes

only allow registering parameters once

d342a7a

fix

bf9d124

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes to gradients #417

Changes to gradients #417

marcoct commented May 18, 2021 •

edited

marcoct commented May 19, 2021

alex-lew left a comment

marcoct commented Jul 14, 2021

marcoct commented Jul 14, 2021 •

edited

marcoct commented Jul 14, 2021 •

edited

marcoct commented Jul 14, 2021 •

edited

marcoct commented Jul 14, 2021

marcoct commented Jul 14, 2021

marcoct commented Jul 14, 2021 •

edited

marcoct commented Jul 14, 2021 •

edited

alex-lew commented Jul 15, 2021

alex-lew commented Jul 15, 2021

alex-lew commented Jul 15, 2021

Changes to gradients #417

Are you sure you want to change the base?

Changes to gradients #417

Conversation

marcoct commented May 18, 2021 • edited

marcoct commented May 19, 2021

alex-lew left a comment

Choose a reason for hiding this comment

marcoct commented Jul 14, 2021

marcoct commented Jul 14, 2021 • edited

marcoct commented Jul 14, 2021 • edited

marcoct commented Jul 14, 2021 • edited

marcoct commented Jul 14, 2021

marcoct commented Jul 14, 2021

marcoct commented Jul 14, 2021 • edited

marcoct commented Jul 14, 2021 • edited

alex-lew commented Jul 15, 2021

alex-lew commented Jul 15, 2021

alex-lew commented Jul 15, 2021

marcoct commented May 18, 2021 •

edited

marcoct commented Jul 14, 2021 •

edited

marcoct commented Jul 14, 2021 •

edited

marcoct commented Jul 14, 2021 •

edited

marcoct commented Jul 14, 2021 •

edited

marcoct commented Jul 14, 2021 •

edited