Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Value error in histplot with binwidth smaller than half the data range #3646

Open
jhncls opened this issue Mar 1, 2024 · 2 comments
Open

Comments

@jhncls
Copy link

jhncls commented Mar 1, 2024

sns.histplot([1, 2, 3], binwidth=7) crashes with

"ValueError: bins must be positive, when an integer"

It is related to #2721, which is marked as solved, but the error also happens in the dev version.

The cause seems to be line 136 in counting.py with bins = int(round((stop - start) / binwidth)), setting it to 0 for small ranges. Changing this to bins = max(1, int(round((stop - start) / binwidth))) would probably solve it adequately.

(The code also has problems when binwidth=0 (division by zero) or negative (this makes bins negative, which causes numpy to protest)).

By the way, binwidth is ignored when discrete=True. As an example, sns.histplot(titanic[titanic['who'] == 'woman'], x='age', binwidth=5) has some bins with 5 and others with 6 ages. Of course, people who really care can provide their own custom bin edges.

@mwaskom
Copy link
Owner

mwaskom commented Mar 1, 2024

Should probably just reject unless 0 < binwidth <= binrange. Not sure any other parametrization makes sense.

By the way, binwidth is ignored when discrete=True.

Yeah, the parameter is documented as

If True, default to binwidth=1 ...

Maybe "default to" implies that you can override it and should be "set binwidth=1

@jhncls
Copy link
Author

jhncls commented Mar 2, 2024

Should probably just reject unless 0 < binwidth <= binrange. Not sure any other parametrization makes sense.

When the data isn't known in advance, in some situations only 0 or 1 data points will pop up. Without setting binwidth, now an empty plot is returned for 0 data points, and a symmetric bin of 1 wide for only one data point. With binwidth set, the code could behave similarly, using that binwidth. A plot showing something a bit reasonable could be friendlier than an error message.

If True, default to binwidth=1 ...

Well, my comment was a suggestion to also allow other integer binwidths for discrete data, shifting the edges by a half. (But maybe this would complicate things too much. Next, people will be asking for discrete units measured fractions, e.g. 1/10ths.)

@jhncls jhncls changed the title Value error in histplot with binwidth larger than the double of data range Value error in histplot with binwidth smaller than half the data range Mar 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants