Better handling of NaNs #15

nicodv · 2016-05-27T17:53:11Z

np.unique is used for encoding data values to integers. However, numpy currently treats every np.NaN as a unique value, creating many categories. (See: numpy/numpy#2111)

Solution is to check with np.isnan so that we can just ignore all NaNs when encoding. The encoder then simply assigns all NaNs to the -1 (i.e. "I don't know this value") category.

The text was updated successfully, but these errors were encountered:

allefeld · 2019-07-06T14:49:31Z

@nicodv, could you expand on this a bit, or give a code example? How does one "check with np.isnan" when using np.unique? Also, a remark in the documentation of np.unique would be useful.

nicodv · 2019-07-08T17:32:15Z

@allefeld , have a look at the commit that fixes this issue: 6fd7c98

It basically filters out NaNs in the encoding and makes sure they get a -1 value.

nicodv closed this as completed in 6fd7c98 May 27, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better handling of NaNs #15

Better handling of NaNs #15

nicodv commented May 27, 2016

allefeld commented Jul 6, 2019

nicodv commented Jul 8, 2019

Better handling of NaNs #15

Better handling of NaNs #15

Comments

nicodv commented May 27, 2016

allefeld commented Jul 6, 2019

nicodv commented Jul 8, 2019