DOC: FutureWarning from string promotion #19078

jbrockmendel · 2021-05-24T01:33:33Z

In the future, np.array([1, 2, "3"]) will raise, and we should use np.array([1, 2, "3"], dtype=object) instead.

This poses a problem if we don't know whether we have data = [1, 2, 3] vs data = [1, 2, "3"]. In the former case we want to ndarray[int64], and in the latter we want ndarray[object]. A solution we could implement on our end might look like:

def call_np_array(obj):
    if np_version_gt_whatever:
        try:
            return np.array(obj)
        except ValueError:
            return np.array(obj, dtype=object)
    else:
        with warnings.catch_warnings():
            warnings.filterwarnings("ignore", class=FutureWarning)
            return np.array(obj)

This is all kinds of not-great. Is there a better way to achieve this?

The text was updated successfully, but these errors were encountered:

seberg · 2021-05-24T02:04:28Z

Hmmm, I honestly thought that pandas wouldn't notice it much, but I guess that was only if the warning was raised and then ignored... and now that isn't the case anymore. I have to think about it, maybe we can quickly add a special dtype="allow_object", but then you would have to change the code depending on the numy version as well.

I have to think about it tomorrow a bit/in the next days.

charris · 2021-05-24T02:40:50Z

@jbrockmendel How is it that you don't know what type you want?

jbrockmendel · 2021-05-24T02:59:56Z

How is it that you don't know what type you want?

Because we're dealing with user input, e.g. pd.Series([1, 2, 3]) is going to call np.array([1, 2, 3])

charris · 2021-05-24T03:25:02Z

@jbrockmendel Hmm, that is going to be tricky to handle, especially if we ever manage to stop automatic creation of object arrays. We would like to do that at some point. Do you also handle ragged arrays this way?

jbrockmendel · 2021-05-24T04:42:32Z

Do you also handle ragged arrays this way?

Pretty much anything that is passed to one of the pandas constructors, if it doesn't already have a .dtype (and no dtype kwarg is specified) will get passed to np.array. If that comes back as object dtype then we'll do inference of our own (ive been trying to make that inference consistent between Series/DataFrame/Index). There's really no special treatment (or even checking for) ragged inputs.

In principle we could track down all the places where we call np.array/np.asarray without specifying a dtype, explicitly pass dtype=object, and then do inference. The expect the performance impact of that would be pretty ugly.

seberg · 2021-05-25T00:51:27Z

So, I am right to assume that pandas either cannot or does not want to force users to be explicit about dtype=object in these cases? Also, I guess it might be pretty darn inconvenient with object columns not all that odd when you tranpose a dataframe?

I would much prefer avoiding this warning context manager. But going back to making warnings.filterwarnings("error", FutureWarning) "opt-in" to the behaviour of object dtype doesn't really help with that.

The new "allow_object" flag/dtype would allow it, even that is a bit tricky. (I can probably assume that no promotion should ever go to object unless one of the inputs is already object. By doing that, I should be able to move the warning generation into the Promote API instead of the DType side API itself, side-stepping the problem that we must not set the warning when the flag is passed.)

jbrockmendel · 2021-05-25T01:25:55Z

So, I am right to assume that pandas either cannot or does not want to force users to be explicit about dtype=object in these cases?

I'm not aware of that possibility ever having been discussed. I can imagine requiring it in the constructors, but there are also places like Index.searchsorted where we accept listlikes that don't have a dtype keyword. I'm trying to narrow down the places where we call np.array on unknown inputs and its pretty daunting.

Also, I guess it might be pretty darn inconvenient with object columns not all that odd when you tranpose a dataframe?

In that case we actually do determine the dtype before doing the transpose, but you've got the right idea.

jbrockmendel · 2021-05-25T17:45:07Z

The non-warning place this is breaking our tests is https://github.com/pandas-dev/pandas/pull/41652/checks?check_run_id=2666093126#step:7:1211 that raises inside np.rec.fromarrays

seberg · 2021-05-25T17:49:22Z

@jbrockmendel that one looks like gh-19085, I am looking at it now. Sorry about that.

jbrockmendel · 2021-05-25T18:01:47Z

Thanks. FWIW i think the relevant cases here involve non-empty strings

seberg · 2021-05-25T18:34:45Z

Yeah, the empty string problems should not be too difficult to fix (have to still figure out the right place to fix it though).

You are right: In any case, the problem is the string promotion with numbers. I still don't have an idea beyond hoping that a special flag to ingore promotion problems will work. We would have to expose that as public API for pandas, I guess. And users mixing strings and integers would have to use an up-to-date pandas. (Maybe that is OK, hopefully few users actually do that kind of thing :/).

xref #5353 and #6070 (I am very sure there is one more that I didn't find)

jbrockmendel · 2021-05-25T22:58:25Z

We would have to expose that as public API for pandas, I guess. And users mixing strings and integers would have to use an up-to-date pandas. (Maybe that is OK, hopefully few users actually do that kind of thing :/).

It seems likely that pandas will have to implement some kind of shim for a while. I've taken a shot at this pandas-dev/pandas#41665, will see if I can get it passing the CI.

seberg · 2021-05-25T23:38:21Z

Thanks, you are running a bit ahead of me. I still have a little hope I can reorganize it so we can get the warning but not force too much warning related acrobatics on downstream/pandas.

bashtage · 2021-05-26T07:49:26Z

It appeared in statmodels pre-release testing when using code like np.array([True, False]) * 1.0. The array is a user-provided array, and the 1.0 was being used to make sure that the array was either float or complex.

Not sure I have the example correct, but we definitely saw this as well.

seberg · 2021-05-26T14:59:05Z

@bashtage but in that case if the array was strings, it would raise an error? It may be that for some functions there is a warning and then later also an error, though.

Otherwise, do you have a full example in case it is more of a problem? Right now, I am trying to allow work-arounds for libraries like pandas only.

bashtage · 2021-05-26T15:01:00Z

It was just warning about a future error. It only errored on pytest because we were not allowing this warning.

seberg · 2021-05-26T17:15:40Z

I am exploring the possibility of explicitly allowing an "object fallback" if promotion fails here: gh-19101 But we would have to settle on the API at the very least.

seberg · 2022-11-29T17:07:41Z

Need to give this another shot soon... I wonder how much work-around we really need for the masked constant...

But the issue here is outdated, so closing.

jbrockmendel mentioned this issue May 24, 2021

CI: FutureWarning from string promotion on actions-38-numpydev pandas-dev/pandas#41632

Closed

2 tasks

jbrockmendel mentioned this issue May 26, 2021

CI: fix npdev build pandas-dev/pandas#41665

Closed

4 tasks

seberg mentioned this issue Aug 8, 2022

BUG: np.array(x) converts np.nan to string, if an element in x is a string #22042

Open

seberg closed this as completed Nov 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: FutureWarning from string promotion #19078

DOC: FutureWarning from string promotion #19078

jbrockmendel commented May 24, 2021

seberg commented May 24, 2021

charris commented May 24, 2021

jbrockmendel commented May 24, 2021

charris commented May 24, 2021

jbrockmendel commented May 24, 2021

seberg commented May 25, 2021

jbrockmendel commented May 25, 2021

jbrockmendel commented May 25, 2021

seberg commented May 25, 2021

jbrockmendel commented May 25, 2021

seberg commented May 25, 2021

jbrockmendel commented May 25, 2021

seberg commented May 25, 2021

bashtage commented May 26, 2021 •

edited

seberg commented May 26, 2021

bashtage commented May 26, 2021

seberg commented May 26, 2021

seberg commented Nov 29, 2022

DOC: FutureWarning from string promotion #19078

DOC: FutureWarning from string promotion #19078

Comments

jbrockmendel commented May 24, 2021

seberg commented May 24, 2021

charris commented May 24, 2021

jbrockmendel commented May 24, 2021

charris commented May 24, 2021

jbrockmendel commented May 24, 2021

seberg commented May 25, 2021

jbrockmendel commented May 25, 2021

jbrockmendel commented May 25, 2021

seberg commented May 25, 2021

jbrockmendel commented May 25, 2021

seberg commented May 25, 2021

jbrockmendel commented May 25, 2021

seberg commented May 25, 2021

bashtage commented May 26, 2021 • edited

seberg commented May 26, 2021

bashtage commented May 26, 2021

seberg commented May 26, 2021

seberg commented Nov 29, 2022

bashtage commented May 26, 2021 •

edited