Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented poisson distribution and added in the code files #266

Open
wants to merge 15 commits into
base: develop
Choose a base branch
from
Open

Implemented poisson distribution and added in the code files #266

wants to merge 15 commits into from

Conversation

Simardeep27
Copy link
Contributor

@Simardeep27 Simardeep27 commented Mar 20, 2021

Implemented poisson distribution as discussed in the issue (#264).
@psesh @ascillitoe it would be helpful if you can look at the implementation of the files and guide me for any further changes to be committed to this file repository.
Changes made in source files :

  1. equadratures.distributions.__init__.py
  2. equadratures.distributions.poisson.py
  3. equadratures.parameters.py
  4. equadratures.tests.test_distributions.py/code>

Thank you for your time.

@discourse-eq
Copy link

This pull request has been mentioned on Discourse — equadratures. There might be relevant details there:

https://discourse.equadratures.org/t/google-summer-of-code-2021-projects/108/25

@ncywong
Copy link
Member

ncywong commented Mar 22, 2021

Hi @Simardeep27 . Thanks for contributing to the code.

Currently, our code features continuous distributions, mostly based on importing from scipy.stats, as you have done in your PR. For continuous distributions, the method get_pdf(), which returns evaluations of the parameter's probability density functions at specific points, is well defined and is inherited from the distribution's scipy implementation. However, for discrete distributions, this is not the case: one can only get the probability mass function instead.

To accommodate this, we need to open up another method in distributions.py and parameter.py. However, to generate the recurrence coefficients for computing quadrature rules for integration---another important feature of the EQ code---we would most likely need a method that is different from that used for continuous distributions.

With this said, at the current stage, we cannot merge this PR. However, this is an excellent opportunity for further development. Literature pertaining to "discrete orthogonal polynomials" is sparse, especially with regards to its potential applications. If the case for polynomials orthogonal to discrete measures (such as the Poisson measure) and their applications can be made, it would be a great addition to our code, on top of a contribution to the literature, potentially in the form of a paper. Supposing that you join us for GSoC for the project on universal quadrature rules, I see this as a great potential topic for exploration.

Please do let us know about any thoughts and ideas on this. Thanks again for your contribution!

@Simardeep27
Copy link
Contributor Author

Hello @ncywong ,
Firstly thanks a lot for the clarification. I will learn more about discrete orthogonal polynomials and try to create a separate method for the same. I had just one query , the new methods have to be created taking develop branch or the main branch as the base?

Thanks again for the guidance

@ascillitoe
Copy link
Member

Hi @Simardeep27, we generally try to follow a workflow like the one described here:

https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow

Essentially this means all new features should be implemented and PR'd into develop. We only merge urgent fixes into master. Periodically we then merge develop into master and update the version number. If the new features are more complex, you can create a new feature branch so the community can test and contribute to the feature, and then we can merge into develop once we're happy with it. For example, if the work on distributions gets more involved we can create a new feature_distributions branch (based on develop) for you to PR too.

@ascillitoe
Copy link
Member

ascillitoe commented Mar 22, 2021

P.s. another thought... since the implementation of discrete distributions is likely to require a bit more thought, it might be an idea to implement more continuous distributions first. For example, see the scipy ones below:

https://docs.scipy.org/doc/scipy/reference/stats.html

Implementing all of the less common ones would be quite tedious and messy. Perhaps a better approach is to add a generic distribution class, where you pass in a user defined scipy distribution object? For example, something that allows us to do:

import equadratures as eq
from scipy.stats import truncnorm
mydist = truncnorm(0.1, 2)
myparam = eq.Parameter(order=1, distribution=mydist)

This would allow the user to use continuous distributions we haven't added, without requiring us to hard code every single distribution into the code. @ncywong can you foresee any complications with something like this?

@ncywong
Copy link
Member

ncywong commented Mar 22, 2021

@ascillitoe Not many complications I'd imagine. One thing that came to mind was the need to define the bounds, useful for quadrature rules with endpoints. scipy does not seem to be able to give us that information automatically.

@ncywong
Copy link
Member

ncywong commented Mar 22, 2021

Hello @ncywong ,
Firstly thanks a lot for the clarification. I will learn more about discrete orthogonal polynomials and try to create a separate method for the same. I had just one query , the new methods have to be created taking develop branch or the main branch as the base?

Thanks again for the guidance

@Simardeep27 I would recommend starting from the develop branch, and since I think this could be a rather large endeavour, opening up a feature branch would be appropriate. That being said, I'd recommend starting with some continuous distributions as @ascillitoe mentioned, and trying to add the "user-defined" scipy object definition capability. Cheers.

@Simardeep27
Copy link
Contributor Author

@Simardeep27 I would recommend starting from the develop branch, and since I think this could be a rather large endeavour, opening up a feature branch would be appropriate. That being said, I'd recommend starting with some continuous distributions as @ascillitoe mentioned, and trying to add the "user-defined" scipy object definition capability. Cheers.

Sure , I will try to implement this and make a PR.
Thanks.

@Simardeep27
Copy link
Contributor Author

Hey @ascillitoe @ncywong,

I had a doubt, I was creating a new class for the user defined scipy distributions , When we pass a scipy function to a variable
For example:

import equadratures as eq
from scipy.stats import truncnorm
mydist = truncnorm(0.1, 2)

the variable becomes a 'rv_frozen' and from scipy documentation we could directly call functions for it like , pdf(),cdf(),stats() etc, So would it be feasible to create separate functions for them to call them by our defined class rather than calling them by direct methods defined by scipy.

@ascillitoe
Copy link
Member

Hi @Simardeep27, if I understand you correctly you're saying should we create "wrappers" around the scipy methods .pdf() etc? If so, yes! I think that's exactly what I had in mind. Essentially, all the functionality that is required of the other equadratures distributions classes e.g. get_cdf() get_pdf(), get_samples() etc, will need to be implemented in this new custom distribution class. This is so the other equadratures classes such as Poly() etc can use this new distribution class. For lots of the methods, such as get_pdf() I think this will be quite easy as it'll simply be a wrapper around the scipy .pdf(), and get_samples() can use use the scipy .rvs() etc.

The only one that might need a bit more thought is the .get_recurrence_coefficients(). @ncywong any thoughts on this one? I assume this would just be inherited from the base Distribution() class in template.py, and it would just be a case of ensuring we have self.x_range_for_pdf defined?

@Simardeep27
Copy link
Contributor Author

Actually I have created the wrapper functions for .pdf(),.cdf(),.stats() and they are working fine , I still need to work on the jacobi coefficients and the recurrence coefficients.

I want to ask whether we need just the upper bound and lower bound of the distribution to find x_range_for_pdf(), scipy provides a function for finding the endpoints of a distribution interval(alpha). So , can we use this to find the x_range_for_pdf and then the recurrence-coefficients?

@ncywong
Copy link
Member

ncywong commented Mar 24, 2021

Hi both, sorry for the late response.

The only one that might need a bit more thought is the .get_recurrence_coefficients(). @ncywong any thoughts on this one? I assume this would just be inherited from the base Distribution() class in template.py, and it would just be a case of ensuring we have self.x_range_for_pdf defined?

That's what I think too. And as @Simardeep27 correctly mentioned, we can use interval from scipy to get the range using a "probability 1 confidence interval". However, I would be a bit careful about its behaviour for infinite (e.g. Gaussian)/semi-infinite (e.g. Exponential) distributions. It's worth playing around a bit to see what suits best. Other than that, custom_recurrence_coefficients should be able to sort the recurrence coefficients out.

A further step would be to ensure that the custom recurrence coefficients are doing their job, by verifying that the orthogonal polynomials generated are indeed orthogonal. Basically, for different distributions, try to evaluate the inner product of the polynomial basis functions with respect to the orthogonal probability distribution. Do let me know if you need help on this.

@ascillitoe
Copy link
Member

Hi both, I looked into it a bit more and I think we can get everything we need from rv.support() and rv.mean()? Or even just rv.a and rv.b to get the lower and upper bounds directly?

@ncywong re the recurrence coefficients, would we just need a bit of logic to decide which method in recurrence_utils.py to call depending on the type of distribution? I'm guessing we would need to check if rv.support() or rv.a and rv.b are np.inf or not? Or as you said are we can always use custom_recurrence_coefficients but then we need a bit of logic to set x_range_for_pdf to something sensible?

@ncywong
Copy link
Member

ncywong commented Mar 24, 2021

would we just need a bit of logic to decide which method in recurrence_utils.py to call depending on the type of distribution?

Yes, from what I gathered, we just need to use the custom_recurrence_coefficients method. For infinite distributions, it seems that covering most of the mass/domain of the pdf is sufficient. Again, would be informative to have some verification on this!

@Simardeep27
Copy link
Contributor Author

Hi both, sorry for the late reply
I tried using the custom_recurrence_coefficients method, I had a doubt that for this method we require the x_range_for_pdf for example. cauchy distribution uses scale for the x_range_for_pdf. Applying custom_reccurence_coefficients provides just nan values. So, how to move forward with this?

P.s. Are there any research papers you could refer for recurrence coefficients, it would help me to get a better understanding of the issue.

Thanks

@ascillitoe
Copy link
Member

Hi @Simardeep27, for the plotting additions, would you be able to merge this into the latest develop branch please? and then PR back into that? It will get confusing if the plotting commits are merged in under this PR for the distributions.

Re your work on distributions, sorry for the delay in getting back to you about this, we are still having a think about how we would be able to validate the above for all the different continuous distrubtions scipy offers.

Thanks!
Ashley

@Simardeep27
Copy link
Contributor Author

Hey @ascillitoe, sorry for the confusion , I will add it in the different PR.

Oh, I will try to find a solution as soon as possible.

@ascillitoe
Copy link
Member

Great thanks!

Re validating the distributions, we need to think a bit more about this. If you'd like to have a go at some other stuff in the meantime, there are plenty more plotting utilities that would be useful to implement (e.g. see here), and happy to discuss other ones not yet mentioned on the discourse post!

@Simardeep27
Copy link
Contributor Author

Sure, I would keep this on sidelines for the moment.

@discourse-eq
Copy link

This pull request has been mentioned on Discourse — equadratures. There might be relevant details there:

https://discourse.equadratures.org/t/triangular-distribution/148/3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants