Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove n_outcomes restriction #66

Open
ihincks opened this issue Jun 15, 2016 · 16 comments
Open

Remove n_outcomes restriction #66

ihincks opened this issue Jun 15, 2016 · 16 comments

Comments

@ihincks
Copy link
Collaborator

ihincks commented Jun 15, 2016

Just want to put this here, not necessarily because it is urgent, but because it has come up as a problem for a few people.

Models have a method called n_outcomes which specifies the number outcomes in an array of expparams. Sometimes models do not have a finite number of outcomes. For example, measurement of NV centers involves drawing Poisson random variates, which has an infinite but discrete number of outcomes. Or, measurements in NMR are continuous valued.

Correct me if I am wrong, searching all source files as of 2c86c94 for the string n_outcomes, the only file in which it is being used rather than being defined is in the updater code of smc.py.

@ihincks
Copy link
Collaborator Author

ihincks commented Jun 15, 2016

I guess it also comes up in risk and information gain, but every use still appears to be in smc.py.

@cgranade
Copy link
Collaborator

Agreed, this is a major concern moving forward, and one that we should definitely address. At the moment, I think the only place that n_outcomes is explicitly used is as you say, in calculating the Bayes risk and information gain for online experiment design. I think it should be fairly straightforward to come up with a better design for defining how utility functions should take expectation values; the primary difficulty is in making that kind of a design appropriately use existing models which define n_outcomes and is_n_outcomes_constant.

@ihincks
Copy link
Collaborator Author

ihincks commented Jun 15, 2016

Talking to Thomas today, it seems that he has basically already solved this issue in his branch continuous-outcomes. His solution is to subclass Model into ContinuousModel (should probably be renamed). This new model has properties that specify how many samples should be taken whenever an expectation value over data needs to be estimated. Whenever such an expectation is being calculated, whether or not the model has a finite number of outcomes is tested, and different things are done in each case.

The alternative is to always have expectation values calculated in this way, even if there are a small number of outcomes.

@cgranade
Copy link
Collaborator

I'll take a more detailed look at @whitewhim2718's branch soon, then. In the meantime, tagging @jarthurgross into this Issue, as the n_outcomes problem is relevant to his continuous-measurement filtering work as well.

@cgranade
Copy link
Collaborator

cgranade commented Jun 15, 2016

For concreteness, I can think of two main usecases that should be addressed by a new design:

  • Continuous-valued outcomes
  • Multinomial (vector-valued but discrete)
  • Poisson (semi-infinite but discrete)

@ihincks
Copy link
Collaborator Author

ihincks commented Jun 15, 2016

And don't forget discrete valued but with infinitely many outcomes, such as Poisson! This is mainly what I am interested in at the moment.

@cgranade
Copy link
Collaborator

Very good point, I've added that to the list as well, then.

@taalexander
Copy link
Contributor

I'd like to say that while I think I have a solution. My branch Continuous-Outcomes is very experimental right now. One glaring issue is that I added a callback feature in SMC update to reselect the sampled outcomes at every update. Somehow I neglected to actually add this to the abstract continuous model, but added it to my implementation. I will clean this all up, but I may not have the time until next week. I also want to say that I am very much open minded on how to implement these enhancements, and am open to changes.

Additionally I have done some other development on the ExperimentDesigner class (among other things) in that branch, that is most likely not relevant to this issue and probably should be split out.

@cgranade
Copy link
Collaborator

Somewhat tangential, but what do people think on which version we should target with this? Personally, I think it makes sense to target a new QInfer version 1.1, so that we can get a stable version out there to build off of while working on ensuring that the new design suits all of our needs and is extensible enough to not cause another problem down the road. Pushing to 1.1 would also fit will with some ongoing work on offline experiment design, as v1.1 could then focus on generalized outcomes and experiment design as the two big new features. Thoughts?

@taalexander
Copy link
Contributor

I agree that the release of v1.0 should be prioritized over this feature. I think it is going to take some discussion and consensus to implement, and realistically that will take at least a month. These changes should be able to be implemented without breaking anything, or at least very little (the current implementation I have is backwards compatible, but maybe not the best way to go if we think long term). I know I haven't been participating in the recent flurry of work (I haven't gotten to code at all in the last month) but I am looking forward to v1.0 to build off of.

@scasagrande
Copy link
Collaborator

scasagrande commented Jun 16, 2016

Here's what I'd do, depending on what course of action you want for the implementation:

  • Implemented without breaking anything else: minor version bump, prioritise 1.0
  • Implemented with breaking other stuff, one of:
    • Release version <1.0, such as 0.9, do all your breaking changes, release 1.0
    • Release 1.0, do your breaking changes, release 2.0 (but that's relatively quick turn around on 1 -> 2, depending on how you feel about those kind of things).
    • Release nothing, do your breaking changes, release 1.0

@cgranade
Copy link
Collaborator

There is one other place that the assumption of integer outcomes gets used, and that's in the perf_testing module. While not strictly related, much of the generalization discussed here would also need to take that into account.

@ihincks
Copy link
Collaborator Author

ihincks commented Jun 16, 2016

I am interested in this issue because my research currently involves the need for poisson outcomes; I am writing the model as this discussion takes place. However, I am calculating the risk using another language. So I don't currently need the risk or information gain functions, and I think in this case, I can just put nonsense in for n_outcomes and it should work. For this reason, I don't really care which version it ends up in, and for the sake of getting on with it and not dragging things down, it makes sense to leave it out of v1.0.

In terms of breaking changes, after having thought about it a bit, I think (but you never know until you try) that there are very uncompromising ways to fix this issue without breaking anything. The only thing I would be tempted to break right now is the class name of Model, to something like FiniteOutcomeModel, but this is pretty superficial.

@ihincks
Copy link
Collaborator Author

ihincks commented Jul 6, 2016

Given a discussion with @cgranade yesterday, these are the proposed structure changes to abstract_class (which contain some breaking changes) for version 1, to make future integration with feature-generalized-outcomes less invasive:

(I am writing this here so that any criticisms can be made before someone puts the much greater effort into a PR)

  • Simulatable gets a new property domain which is of a class inheriting from a new abstract class Domain. In particular, a Domain will have a np.dtype which will allow for fancy output types, like lists, and will be able to enumerate all possible outcomes for systems with finite numbers of outcomes. Note that dtype will not be allowed to depend on expparam in this scheme, unless modified -- this should perhaps be discussed.
  • The property n_outcomes will not depend on the domain, but will rather specify how many outcomes will result when a to-be-implemented-method marginalized_outcomes is called. This will typically be equal to the total number of outcomes if there are a tractable number of them.
  • Model will inherit from Simulatable, as it currently does, and introduce the notion of the likelihood function
  • FiniteOutcomeModel will inherit from Model and will differ from it in that it will implement the marginalized_outcomes method partly by enumerating over all possible outcomes.
  • DifferentiableModel will inherit from abc.ABCMeta and Model and a concrete subclass can optionally inherit from FiniteOutcomeModel

The idea is that functions like the bayes risk in smc.py can now depend only on marginalized_outcomes, thereby removing the necessity of being able to enumerate all possible outcomes and call the likelyhood on them. Indeed, a system only needs to be simulatable to estimate the bayes risk with a Monte Carlo scheme.

Prototypical scenarios to consider (add more if you have them):

  • Gaussian model, where an outcome can be any real number
  • Multinomial model, where an outcome is a tuple of non-negative integers with a constrained sum (np.dtype([('k', np.int, self.n_elements)]); as an aside, this type seems to require an excessive number of brackets and parens.)
  • Poisson model, where an outcome is a non-negative integer, but there are infinitely many possibilities

@taalexander
Copy link
Contributor

taalexander commented Jul 6, 2016

I agree with everything said here. With regards to my branch which contains my current work on this issue, not many changes will have to be made to come in line with the above.

edit - I was mistaken on the scope of this Issue. I had not realized that these changes were in preparation of the later branch me and @ihincks hope to submit, and not the work itself. Regardless my original statement above still stands.

@ihincks
Copy link
Collaborator Author

ihincks commented Sep 16, 2016

#71 resolves the preparation step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants