New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A Solid Foundation for Statistics in Python with SciPy #26
Comments
@mdhaber you can checkoff the box for the 'check for NaN in spearmanrho' now |
Yup, thanks! |
@mdhaber you can checkoff box for the |
scipy#11119 was merged, so you can check off the new |
PR for the relative risk: scipy#13048 |
@WarrenWeckesser Two more weeks left in the quarter... |
@mdhaber you can check off the multivariate hypergeometric box:
|
I apologize if this is not the appropriate channel to open this discussion. Following the issues covered in scipy#11477, I would like to share my findings related to
What can be done?
In the last two cases, a follow-up of the code's modification has to be done. Why?
I would be happy to help in any direction you decide. |
Hi @caos21, thanks for mentioning this. @mckib2 is actually working on replacing parts of SciPy.stats with the Boost versions in #48. Would you be interested in taking a look at that? We're not going to change everything at once; this first PR will only actually replace SciPy's |
Hi and thank you @mdhaber , I think it is reasonable. But first, I would like to inspect how involved is Should we move this discussion to #48 ? |
In that case, it would probably be better to open an issue or PR on the main repo, or maybe email the mailing list to get wider attention. Only a few of us are working here now. |
Perfect I will do, and after that, I will jump into boost #48 to see how can I be of use |
@mdhaber Don't know if we're interested in still keeping this list up to date:
|
Thanks @mckib2. We've been working from Monday.com recently, but it is still good to check these off. |
Functions/distributions we might want to borrow from Boost:
|
IIRC, several bugs involving constant input (i.e. all elements of a slice equal) have been reported. I'll collect them here as I run across them.
scipygh-13254 tried to address this for some functions, but I suspect it is a widespread problem. |
Overview of "A Solid Foundation for Statistics in Python with SciPy".
Expand tools for the analysis of variance
New Statistical Tests
Improve Existing Tests
Confidence intervals - ENH: stats: add
bootstrap
for estimating confidence interval and standard error of an n-sample statistic scipy/scipy#13371 (@mdhaber)Binomial Test - ENH: stats: Addbinomtest
to replacebinom_test
. scipy/scipy#12603 (@WarrenWeckesser)pearsonr - ENH: stats: Add 'alternative' and confidence interval to pearsonr. scipy/scipy#12609Options for one-sided p-values - ENH: stats: one-sided p-values for statistical tests scipy/scipy#12506, (@DominicChm)
ttests - ENH: Add single-sided p-values to t-tests scipy/scipy#12597skewtest / kurtosistest / ranksums - ENH: stats: add 'alternative' keyword to some normality tests. scipy/scipy#13549spearmanr/linregress - ENH: Add single-sided p-values to remaining spearmanr and linregress scipy/scipy#12801mood - ENH: Add 'alternative' to functions using normal CDF for p-value scipy/scipy#13008ansari - ENH: stats: add 'alternative' parameter to ansari scipy/scipy#13650pearsonr - ENH: stats: Add 'alternative' and confidence interval to pearsonr. scipy/scipy#12609Enhanced results for 2 x 2 contingency tables
Conditional maximum likeilhood odds ratio - ENH: stats: Add the function odds_ratio. scipy/scipy#13340 (@WarrenWeckesser)Relative risk - ENH: stats: Add a function that computes the relative risk. scipy/scipy#13048 (@WarrenWeckesser)Fitting Probability Distributions to Data
rv_continuous.fit
scipy/scipy#11695 (@mdhaber)fit
methods where possible - ENH: stats: more analytical formulas for fitting distributions to data scipy/scipy#11782 (@swallan)laplace - ENH: Override fit method of Laplace distribution with Maximum Likelihood Estimates scipy/scipy#11988pareto - ENH: stats: override stats.pareto.fit with analytical MLE scipy/scipy#12457rayleigh - ENH: stats: override stats.rayleigh.fit with analytical MLE scipy/scipy#12097invgauss - ENH: stats: override stats.invgauss.fit with analytical MLE scipy/scipy#12514logistic - ENH: stats: override stats.logistic.fit with system of equations for MLE scipy/scipy#12738gumbel - ENH: stats: override stats.gumbel_r.fit and stats.gumbel_l.fit with system of equations for MLE scipy/scipy#12737New Probability Distributions
Improve underlying code for PDF and CDF calculations
Decrease Open Statistics Issues
By the end of the project, we want the number of open stats issues to be below 282 (number of open stats issues on 3/18/2020), and preferably under 261 (number of open stats issues on 3/18/2020 created before project start date 2/1/2020). This is @mdhaber's list of issues to watch/fix; none need to be closed to finish the project, but it would be great to make a dent.
differential_entropy
- ENH: add astats.differential_entropy
function scipy/scipy#13631a
parameter fpr johnsonsu/johnsonsb scipy/scipy#13444barnard_exact
test to scipy.stats. scipy/scipy#13441nakagami_gen.fit
scipy/scipy#13396mannwhitneyu
- stats.mannwhitneyu could support arrays scipy/scipy#12837, inconsistent result from ttest_ind and mannwhitneyu when used with groupby and apply scipy/scipy#11113moment
method for input arrays - BUG: fix moments method to support arrays and list scipy/scipy#12197stats.binom.cdf
issue - scipy.stats.binom_test / binom.sf return incorrect values for large x and n scipy/scipy#13079 (watching)anderson_ksamp
- too many critical values returned in documentation examples scipy/scipy#11140scipy.stats.skew
roundoff error - scipy.stats.skew doesn't work correctly for float point numbers scipy/scipy#11086 - @WarrenWeckesser will re-review/mergestats.zscore
roundoff error - stats.zscore inconsistent behavior when all values are the same scipy/scipy#12815gausshyper
distribution accepts invalid parameters - gausshyper distribution accepts invalid parameters scipy/scipy#10134rayleigh.fit
issue - Parameter estimation for : Left skewed distributions scipy/scipy#13071binned_statistic_dd
- binned_statistic_dd does not respect masked arrays scipy/scipy#12898stats.lognorm
- What are the input arguments “s” and “scale” defined in scipy.stats.lognorm? (scipy.stats.lognorm documentation error) scipy/scipy#12844pearsonr
- feature request: makescipy.stats.pearsonr
accept 2-D arrays scipy/scipy#9307scipy.special.bdtrik
incorrect - Wrong confidence interval for binomial distribution with p=0 scipy/scipy#11134scipy.special.btdtri
inaccurate - [Bug] The result of stats.beta.isf is inconsistent with stats.beta.sf scipy/scipy#12794, scipy.stats.beta.ppf gives unexpexted results scipy/scipy#12635weightedtau
documentation mistake - Confusing documentation of scipy.stats.weightedtau scipy/scipy#12778rv_histogram
PR - scipy.stats: rv_histogram fit method scipy/scipy#12759 - @WarrenWeckesser will closeaxis
inscipy.stats.tmean
scipy/scipy#12143, tmean does not work with multiple axis and limits option scipy/scipy#9770 - @WarrenWeckesser will replyrv_continuous
assumes scalar parameters - rv_continuous assumes shape parameters are scalars and this is not documented scipy/scipy#10661circstd
- Add literature reference for circstd (and circvar?) scipy/scipy#10096interval
method description - DOC: Correction for scipy.stats.poisson - interval documentation scipy/scipy#9706fit
ting several distributions - stats distributions fit problems (Trac #1359) scipy/scipy#1884stats.ansari
scipy/scipy#5012expect
termination condition - stats expect for discrete, termination condition scipy/scipy#2983test_continuous_basic
- TST: stats:check_sample_var
should be two-sided (Trac #1546) scipy/scipy#2071Outreach Event
Other
The text was updated successfully, but these errors were encountered: