Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A Solid Foundation for Statistics in Python with SciPy #26

Closed
mdhaber opened this issue Jul 20, 2020 · 17 comments
Closed

A Solid Foundation for Statistics in Python with SciPy #26

mdhaber opened this issue Jul 20, 2020 · 17 comments

Comments

@mdhaber
Copy link
Owner

mdhaber commented Jul 20, 2020

Overview of "A Solid Foundation for Statistics in Python with SciPy".

Expand tools for the analysis of variance

New Statistical Tests

Improve Existing Tests

Fitting Probability Distributions to Data

New Probability Distributions

Improve underlying code for PDF and CDF calculations

Decrease Open Statistics Issues
By the end of the project, we want the number of open stats issues to be below 282 (number of open stats issues on 3/18/2020), and preferably under 261 (number of open stats issues on 3/18/2020 created before project start date 2/1/2020). This is @mdhaber's list of issues to watch/fix; none need to be closed to finish the project, but it would be great to make a dent.

Outreach Event

Other

@rlucas7
Copy link

rlucas7 commented Sep 13, 2020

@mdhaber you can checkoff the box for the 'check for NaN in spearmanrho' now

@mdhaber
Copy link
Owner Author

mdhaber commented Sep 14, 2020

Yup, thanks!

@rlucas7
Copy link

rlucas7 commented Oct 11, 2020

@mdhaber you can checkoff box for the multivariate t distribution now too.

@WarrenWeckesser
Copy link

scipy#11119 was merged, so you can check off the new cramervonmises test.

@WarrenWeckesser
Copy link

PR for the relative risk: scipy#13048

@mdhaber
Copy link
Owner Author

mdhaber commented Nov 8, 2020

@WarrenWeckesser Two more weeks left in the quarter...

@rlucas7
Copy link

rlucas7 commented Nov 21, 2020

@mdhaber you can check off the multivariate hypergeometric box:

 multivariate hypergeometric distribution - scipy#12585, scipy#12839 (@mdhaber)

@caos21
Copy link

caos21 commented Dec 22, 2020

I apologize if this is not the appropriate channel to open this discussion.

Following the issues covered in scipy#11477, I would like to share my findings related to dcdflib, which is used to evaluate the cumulative density function (CDF).

  • dcdflib was added to the SciPy's subversion repository on February 23, 2002 4423ed55, possibly from the SciPy's CVS repository, which I didn't found online.
  • At least there are three versions of this library 1994, v1.1, and the version uploaded to scipy which is outdated by v1.1. The dcdflib v1.1 is available from netlib random here
  • The same author of v1.1 offers a FORTRAN95 version with improvements called cdflib90
  • A FORTRAN90 version is available here with all the functions in one file.

What can be done?

  • I know that one possibility is to replace the existing dcdflib with the boost math toolkit.
  • Another possibility is to update the actual version to v1.1.
  • Use the F90 or F95 implementations.

In the last two cases, a follow-up of the code's modification has to be done.

Why?

  • The actual code lacks precision and inline comments compared to the others.
  • It could close a couple of issues.

I would be happy to help in any direction you decide.

@mdhaber
Copy link
Owner Author

mdhaber commented Dec 22, 2020

Hi @caos21, thanks for mentioning this. @mckib2 is actually working on replacing parts of SciPy.stats with the Boost versions in #48. Would you be interested in taking a look at that? We're not going to change everything at once; this first PR will only actually replace SciPy's beta, binom, and nbinom distributions. The idea is to get all the machinery in place so that it will be easy to take things from Boost as needed in future PRs. Would this make it easy to replace cdflib with Boost's tools?

@caos21
Copy link

caos21 commented Dec 23, 2020

Hi and thank you @mdhaber , I think it is reasonable. But first, I would like to inspect how involved is cdflib in all SciPy.
In the meantime, I can update cdflib to V1.1 and apply all the patches and modifications done in the past. In that way, I hope nothing breaks.

Should we move this discussion to #48 ?

@mdhaber
Copy link
Owner Author

mdhaber commented Dec 23, 2020

In that case, it would probably be better to open an issue or PR on the main repo, or maybe email the mailing list to get wider attention. Only a few of us are working here now.

@caos21
Copy link

caos21 commented Dec 23, 2020

Perfect I will do, and after that, I will jump into boost #48 to see how can I be of use

@mckib2
Copy link

mckib2 commented Feb 12, 2021

@mdhaber Don't know if we're interested in still keeping this list up to date:

@mdhaber
Copy link
Owner Author

mdhaber commented Feb 12, 2021

Thanks @mckib2. We've been working from Monday.com recently, but it is still good to check these off.

@mdhaber
Copy link
Owner Author

mdhaber commented Mar 10, 2022

@mdhaber
Copy link
Owner Author

mdhaber commented Mar 15, 2022

@tupui At a glance, these are the issues and PRs we have open for multivariate distributions. Multivariate distributions represent ~1/5 of the number of open issues and PRs and issues with the scipy.stats label.

Multivariate Distributions - 30 of the 187 issues with scipy.stats label, 9 out of 52 PRs with scipy.stats label as of 3/14/2022.

Multivariate Normal - (Fewer than 150 lines of real code has all these issues and PRs.)

PRs

Issues

New Distribution

PRs

Issues

Other

PRs

Issues

@mdhaber
Copy link
Owner Author

mdhaber commented Mar 15, 2022

@mdhaber mdhaber closed this as completed Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants