added "one-sided" alternative for proportion_confint #9249

mcmrc · 2024-05-19T14:51:31Z

closes ENH: proportion_confint only has two-sided confidence intervals, no "alternative" option #9239
tests added / passed.
code/documentation is well formatted.
properly formatted commit message. See
NumPy's guide.

I added the 'side' parameter to proportion_confint to enable the calculation of a one-sided CI. The parameter has three options: two-sided(default), upper, and lower. When either upper or lower is selected, ci_low or ci_upp is returned as np.nan.

pep8speaks · 2024-05-19T14:51:38Z

Hello @mcmrc! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2024-05-25 14:34:36 UTC

josef-pkt · 2024-05-19T19:34:31Z

statsmodels/stats/tests/test_proportion.py

+    assert_equal(ci_upp, 1.0)
+
+    ci_low, ci_upp = proportion_confint(100, 200)
+    assert_almost_equal(ci_low, 0.4307048087825161)


where do the comparison numbers come from?

we use now mainly assert_allclose with atol and/or rtol
(usage of assert_almost_equal comes mostly from before numpy had assert_allclose)

These numbers come from the latest version of statsmodels.stats.proportion_confint. Because I did not alter the calculation logic, those numbers must not be changed.
Should I change all assert_almost_equal to assert_allcose?

usually we add comparison numbers from other packages, mainly from R for stats.
A # comment should mention the source, regression tests if numbers are from an earlier statsmodels version.

josef-pkt · 2024-05-19T21:18:20Z

about binomtest confidence intervals.

scipy stats has now also confint and alternative options (which it did not have when we added those in statsmodels)

AFAICS, the one sided binom test confints are the same as the beta. (although scipy does rootfinding also in the one-sided case)
unit tests for binomtest confidence interval in scipy with reference numbers from R
https://github.com/scipy/scipy/blob/main/scipy/stats/tests/test_morestats.py#L920

notation/definition
we use the "alternative" keyword also for confints, with larger or smaller for the outside/alternative region

boundary points
the one-sided intervals should have 0 or 1 for the "open" side instead of nan

josef-pkt · 2024-05-19T21:18:29Z

Thanks for the PR

mcmrc · 2024-05-19T21:29:44Z

Thank you so much for all the comments.
I will do

calculate reference numbers from R
change the argument name to alternative and options to larger and smaller
use 1 or 0 for open-side.

josef-pkt · 2024-05-19T21:46:17Z

great

and add the one-sided intervals to binom test method.
Based on a quick check, I get the same results in scipy as using one of the sides of the beta intervals.
(AFAIR, those are exact in the one-sample case)

jeffreys looks fine to me, it could be replace by isf and ppf in the two one-sided cases. but your current code using interval just does a little bit of redundant work.
(scipy distribution interval method is "central", equal tail interval, so standard pattern with doubling alpha still holds.

mcmrc · 2024-05-19T23:11:34Z

It seems like the outputs from the current version of statsmodels and R 4.3.3 do not match.

statsmodels: proportion_confint(100, 200, method='binom_test') -> (0.42988371451156804, 0.5701162853923397)
R: binom.test(100, 200, conf.level=1-0.05)$conf.int -> [1] 0.4286584 0.5713416

Do you have any ideas?

josef-pkt · 2024-05-20T12:57:57Z

The problem is base R,
they compute the central, equal tail confidence interval even though the hypothesis test is minlike

proportion_confint(100, 200, method='beta')
(0.42865843090683226, 0.5713415690931678)

There is an article that points this out and an R package that provides matching confidence interval
after looking for it: it's exactci, Fay, https://cran.r-project.org/web/packages/exactci/exactci.pdf
e.g. #8233

for method binomtest you need a unit test separate from the other methods, because you cannot check that the one-sided interval can be obtained by doubling alpha.

mcmrc · 2024-05-20T13:24:27Z

Thank you, Josef. Understood. scipy.stats.binomtest is not based on a distribution with equal tails.
The past issue is very educative. I'll keep on working on it.

josef-pkt · 2024-05-20T13:44:32Z

minlike and central/equal-tail are based on the same distribution, loglikelihood
the equal-tail requires prob <= alpha / 2 in each tail
minlike provides the shortest interval (in uni-modal distributions) but only controls overall coverage and does not control coverage in each tail.

In symmetric distributions like normal, t, both methods are the same, but not in asymmetric distributions that we have in proportions and rates (Binomial and Poisson).

(Coverage in approximate methods can still be slightly offcenter because of the approximation, actual tail probabilites are only approximately alpha/2)

mcmrc · 2024-05-20T15:53:56Z

I committed new codes.

calculated reference numbers from R (BinomCI, binom.test and exactci)
changed the argument name to alternative and options to smaller and larger
use 0 or 1 for open-side.
updated binom_test method with alternative="two-sided"
diverted method from beta to one-side binom_test
simplified jeffreys method using ppf and isf
refactored test file

mcmrc · 2024-05-20T16:05:24Z

Sorry, I've forgotten to remove redundant parts. Please review the latest one.

josef-pkt · 2024-05-20T16:07:32Z

statsmodels/stats/proportion.py

+                    if alternative == 'two-sided':
+                        print(qi, stats.binomtest(count, nobs, p=qi).pvalue - alpha)
+                        return stats.binomtest(count, nobs, p=qi).pvalue - alpha
+                    elif alternative == 'smaller':


AFAICS, these will never be used because of the if above.
so we don't need it

I removed it from the latest commit.

josef-pkt · 2024-05-20T16:08:11Z

statsmodels/stats/proportion.py

+
+
+    if alternative == 'two-sided':
+        if method != "binom_test":


Why not for binom_test?

Because scipy.stats.binomtest uses "two-sided" as default for alternative, the original alpha should be used instead of alpha / 2.0.
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binomtest.html

josef-pkt · 2024-05-20T16:11:22Z

looks good overall
alternative has maybe larger and smaller switched
I need to go over some details later. (They require thinking which would distract me from my own PR)

mcmrc · 2024-05-20T16:20:39Z

Thank you so much for your time.
It is really confusing which side is "larger" or "smaller"; sorry for my mistake.
I read this docs and used "smaller" as equivalent as "less."

josef-pkt · 2024-05-20T16:34:17Z

However, in your example:
alternative true probability is less than 0.5, but values less than 0.5 are inside the confidence interval, and not in the alternative rejection region.

I will have to think about this later again. I'm also always getting confused.

something like: at the upper limit of the left (lower) interval we reject the null hypothesis in favor of the alternative hypothesis that the proportion is larger than the value with a pvalue = alpha (in continuous case, and weak inequality in discrete case)
But I need to verify this.

josef-pkt · 2024-05-20T21:23:01Z

AFAICS, my intuition was correct
for example https://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm

in our (my definition in statsmodels) the one-sided alternative also corresponds to the one-sided alternative in the hypothesis test.
If the true value is in/inside the confidence interval, then we reject the null hypothesis in favor of the alternative hypothesis with probability less than alpha in repeated sampling.

This means, the rejection region is in the direction of the alternative.
For the lower tail confidence interval, the rejection region is in the direction of the "larger" alternative.
For the upper tail confidence interval, the rejection region is in the direction of the "smaller"or "less" alternative.

mcmrc · 2024-05-21T01:00:22Z

I see that. So, the word 'alternative' corresponds to the reject region.

josef-pkt · 2024-05-25T04:26:34Z

Can you squash this? earlier commits are not relevant

final version looks good to me and can be merged

Thanks for the PR

switched "larger" and "smaller" updated binom_test format styles added "one-sided" alternative for proportion_confint

mcmrc · 2024-05-25T14:37:07Z

I've been always helped by statsmodels in my research.
Thank you so much for your time and effort, Josef.

josef-pkt · 2024-05-25T15:35:59Z

unrelated unit test failure, merging

josef-pkt · 2024-05-25T15:41:38Z

If you are interested, then you could add alternative in the same way to poisson rates
https://www.statsmodels.org/dev/generated/statsmodels.stats.rates.confint_poisson.html

Those are all central, so unit test could be to just compare with the two-sided interval.

mcmrc · 2024-05-25T16:02:44Z

Let me take that.
I created a new issue and will make a PR.
#9254

josef-pkt reviewed May 19, 2024

View reviewed changes

josef-pkt reviewed May 20, 2024

View reviewed changes

ENH: add alternative option to proportion_confint

c828cab

switched "larger" and "smaller" updated binom_test format styles added "one-sided" alternative for proportion_confint

mcmrc force-pushed the feature/one-sided-option-for-proportion-confint branch from a666bda to c828cab Compare May 25, 2024 14:34

josef-pkt merged commit 7146193 into statsmodels:main May 25, 2024
2 of 3 checks passed

josef-pkt added type-enh comp-stats labels May 25, 2024

josef-pkt added this to the 0.15 milestone May 25, 2024

mcmrc mentioned this pull request May 25, 2024

ENH: confint_poisson only has two-sided confidence intervals #9254

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added "one-sided" alternative for proportion_confint #9249

added "one-sided" alternative for proportion_confint #9249

mcmrc commented May 19, 2024 •

edited

pep8speaks commented May 19, 2024 •

edited

josef-pkt May 19, 2024

mcmrc May 19, 2024

josef-pkt May 19, 2024

josef-pkt commented May 19, 2024

josef-pkt commented May 19, 2024

mcmrc commented May 19, 2024

josef-pkt commented May 19, 2024

mcmrc commented May 19, 2024

josef-pkt commented May 20, 2024

mcmrc commented May 20, 2024

josef-pkt commented May 20, 2024

mcmrc commented May 20, 2024 •

edited

mcmrc commented May 20, 2024

josef-pkt May 20, 2024

mcmrc May 20, 2024

josef-pkt May 20, 2024

mcmrc May 20, 2024

josef-pkt commented May 20, 2024

mcmrc commented May 20, 2024

josef-pkt commented May 20, 2024

josef-pkt commented May 20, 2024

mcmrc commented May 21, 2024

josef-pkt commented May 25, 2024

mcmrc commented May 25, 2024

josef-pkt commented May 25, 2024

josef-pkt commented May 25, 2024

mcmrc commented May 25, 2024

added "one-sided" alternative for proportion_confint #9249

added "one-sided" alternative for proportion_confint #9249

Conversation

mcmrc commented May 19, 2024 • edited

pep8speaks commented May 19, 2024 • edited

Comment last updated at 2024-05-25 14:34:36 UTC

josef-pkt May 19, 2024

Choose a reason for hiding this comment

mcmrc May 19, 2024

Choose a reason for hiding this comment

josef-pkt May 19, 2024

Choose a reason for hiding this comment

josef-pkt commented May 19, 2024

josef-pkt commented May 19, 2024

mcmrc commented May 19, 2024

josef-pkt commented May 19, 2024

mcmrc commented May 19, 2024

josef-pkt commented May 20, 2024

mcmrc commented May 20, 2024

josef-pkt commented May 20, 2024

mcmrc commented May 20, 2024 • edited

mcmrc commented May 20, 2024

josef-pkt May 20, 2024

Choose a reason for hiding this comment

mcmrc May 20, 2024

Choose a reason for hiding this comment

josef-pkt May 20, 2024

Choose a reason for hiding this comment

mcmrc May 20, 2024

Choose a reason for hiding this comment

josef-pkt commented May 20, 2024

mcmrc commented May 20, 2024

josef-pkt commented May 20, 2024

josef-pkt commented May 20, 2024

mcmrc commented May 21, 2024

josef-pkt commented May 25, 2024

mcmrc commented May 25, 2024

josef-pkt commented May 25, 2024

josef-pkt commented May 25, 2024

mcmrc commented May 25, 2024

mcmrc commented May 19, 2024 •

edited

pep8speaks commented May 19, 2024 •

edited

mcmrc commented May 20, 2024 •

edited