-
-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: FutureWarning from string promotion #19078
Comments
Hmmm, I honestly thought that pandas wouldn't notice it much, but I guess that was only if the warning was raised and then ignored... and now that isn't the case anymore. I have to think about it, maybe we can quickly add a special I have to think about it tomorrow a bit/in the next days. |
@jbrockmendel How is it that you don't know what type you want? |
Because we're dealing with user input, e.g. |
@jbrockmendel Hmm, that is going to be tricky to handle, especially if we ever manage to stop automatic creation of object arrays. We would like to do that at some point. Do you also handle ragged arrays this way? |
Pretty much anything that is passed to one of the pandas constructors, if it doesn't already have a .dtype (and no dtype kwarg is specified) will get passed to In principle we could track down all the places where we call np.array/np.asarray without specifying a dtype, explicitly pass dtype=object, and then do inference. The expect the performance impact of that would be pretty ugly. |
So, I am right to assume that pandas either cannot or does not want to force users to be explicit about I would much prefer avoiding this warning context manager. But going back to making The new |
I'm not aware of that possibility ever having been discussed. I can imagine requiring it in the constructors, but there are also places like
In that case we actually do determine the dtype before doing the transpose, but you've got the right idea. |
The non-warning place this is breaking our tests is https://github.com/pandas-dev/pandas/pull/41652/checks?check_run_id=2666093126#step:7:1211 that raises inside np.rec.fromarrays |
@jbrockmendel that one looks like gh-19085, I am looking at it now. Sorry about that. |
Thanks. FWIW i think the relevant cases here involve non-empty strings |
Yeah, the empty string problems should not be too difficult to fix (have to still figure out the right place to fix it though). You are right: In any case, the problem is the string promotion with numbers. I still don't have an idea beyond hoping that a special flag to ingore promotion problems will work. We would have to expose that as public API for pandas, I guess. And users mixing strings and integers would have to use an up-to-date pandas. (Maybe that is OK, hopefully few users actually do that kind of thing :/). xref #5353 and #6070 (I am very sure there is one more that I didn't find) |
It seems likely that pandas will have to implement some kind of shim for a while. I've taken a shot at this pandas-dev/pandas#41665, will see if I can get it passing the CI. |
Thanks, you are running a bit ahead of me. I still have a little hope I can reorganize it so we can get the warning but not force too much warning related acrobatics on downstream/pandas. |
It appeared in statmodels pre-release testing when using code like Not sure I have the example correct, but we definitely saw this as well. |
@bashtage but in that case if the array was strings, it would raise an error? It may be that for some functions there is a warning and then later also an error, though. Otherwise, do you have a full example in case it is more of a problem? Right now, I am trying to allow work-arounds for libraries like pandas only. |
It was just warning about a future error. It only errored on pytest because we were not allowing this warning. |
I am exploring the possibility of explicitly allowing an "object fallback" if promotion fails here: gh-19101 But we would have to settle on the API at the very least. |
Need to give this another shot soon... I wonder how much work-around we really need for the masked constant... But the issue here is outdated, so closing. |
xref #18999 (comment)
In the future,
np.array([1, 2, "3"])
will raise, and we should usenp.array([1, 2, "3"], dtype=object)
instead.This poses a problem if we don't know whether we have
data = [1, 2, 3]
vsdata = [1, 2, "3"]
. In the former case we want to ndarray[int64], and in the latter we want ndarray[object]. A solution we could implement on our end might look like:This is all kinds of not-great. Is there a better way to achieve this?
The text was updated successfully, but these errors were encountered: