Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing a dict in a DataFrame fails #17777

Closed
andreas-thomik opened this issue Oct 4, 2017 · 4 comments
Closed

Storing a dict in a DataFrame fails #17777

andreas-thomik opened this issue Oct 4, 2017 · 4 comments
Labels
Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@andreas-thomik
Copy link

andreas-thomik commented Oct 4, 2017

Code Sample, a copy-pastable example if possible

Both of the examples below fail with the same error

df = pd.DataFrame(index=[0, 1, 2], columns=['a', 'b'])

df.loc[0, 'a'] = dict(x=2)
df.iloc[0, 0] = dict(x=2)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-282-62f3ee5ff885> in <module>()
      1 # file_map.loc[file_no, 'Q_step_length'] = dict(a=1)
      2 df = pd.DataFrame(index=[0, 1, 2], columns=['a', 'b'])
----> 3 df.iloc[0, 0] = dict(x=2)
      4 df['a'] = df['a'].apply(lambda x: x[0] if not pd.isnull(x) else x)
      5 df

...\lib\site-packages\pandas\core\indexing.py in __setitem__(self, key, value)
    177             key = com._apply_if_callable(key, self.obj)
    178         indexer = self._get_setitem_indexer(key)
--> 179         self._setitem_with_indexer(indexer, value)
    180 
    181     def _has_valid_type(self, k, axis):

...\lib\site-packages\pandas\core\indexing.py in _setitem_with_indexer(self, indexer, value)
    603 
    604             if isinstance(value, (ABCSeries, dict)):
--> 605                 value = self._align_series(indexer, Series(value))
    606 
    607             elif isinstance(value, ABCDataFrame):

...\lib\site-packages\pandas\core\indexing.py in _align_series(self, indexer, ser, multiindex_indexer)
    743             return ser.reindex(ax)._values
    744 
--> 745         raise ValueError('Incompatible indexer with Series')
    746 
    747     def _align_frame(self, indexer, df):

ValueError: Incompatible indexer with Series

This works, but is placing a list into the dataframe

df[0, 'a'] = [dict(x=2)]

It is possible to get the dict directly in the dataframe by using a very inelegant construct like this:

df['a'] = df['a'].apply(lambda x: x[0] if not pd.isnull(x) else x)

Problem description

Since it is possible to store a dict in a dataframe, trying an assignment as above should not fail. I am aware that df.loc[...] = dict(...) will assign values in the dict to the corresponding columns if present (is that documented?) and has its own issues but this behaviour should not apply when accessing a single location of the dataframe

Expected Output

A dataframe with a dict inside the specified location.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.3
pytest: None
pip: 9.0.1
setuptools: 36.5.0
Cython: 0.26
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Oct 4, 2017

this is pretty non-idiomatic, and you are pretty much on your own here. you could do it by just using a list/tuple around it

In [14]: df.loc[0, 'a'] = [dict(x=2)]

In [15]: df
Out[15]: 
            a    b
0  [{'x': 2}]  NaN
1         NaN  NaN
2         NaN  NaN

@jreback jreback closed this as completed Oct 4, 2017
@jreback jreback added this to the won't fix milestone Oct 4, 2017
@jreback jreback added the Indexing Related to indexing on series/frames, not to indexes themselves label Oct 4, 2017
@aaclayton
Copy link

Encountered the same issue, had two thoughts:

Storing a dict within a DataFrame is unusual, but there are valid cases where software may be using Pandas as a way to represent and manipulate arbitrary key/value style data where the data is indexed in a way that makes sense for panel representation.

The behavior that location based indexing will update columns based on the keys/values of a provided dictionary was a surprise to me. This is a cool convenience feature that makes sense when an explicit column is not referenced. For example, when providing:

df.loc[row, :] = dict(key1=value1, key2=value2)

It makes sense that the keys of the dictionary might be written as columns and that df.loc[row, key1] == value1. However, when providing an explicit column index, inferring the target columns from a provided dictionary is (to me) counter-intuitive. If I instead supply:

df.loc[row, col] = dict(key=value)

I am explicitly denoting that I want to store the entire value in the col column, and I would expect the dictionary to be inserted as-is.

Anyways, I agree with @jreback that this is somewhat non-idiomatic BUT I am sympathetic to the original issue raised by @andreas-thomik. I encountered a problem where trying to store a dict to an element of a dataframe using this syntax made sense for the particular problem I was facing, so he isn't entirely on his own with this request.

@jreback
Copy link
Contributor

jreback commented Dec 28, 2017

@aaclayton this is related to #18955 . We could/should prob supporting setting scalars of dicts better (and other iterables). Its a bit tricky though.

@TomAugspurger TomAugspurger modified the milestones: won't fix, No action Jul 6, 2018
@varadpatil
Copy link

@jreback, The behaviour is not uniform here, as assigning more than one value with dictionaries works, while assigning a single value doesn't.

This works:

In [18]: df = pd.DataFrame(index=[0, 1, 2], columns=['a', 'b'])

In [19]: df
Out[19]: 
     a    b
0  NaN  NaN
1  NaN  NaN
2  NaN  NaN

In [20]: df.loc[slice(None),'a']=[{'x':2}]*3

In [21]: df
Out[21]: 
          a    b
0  {'x': 2}  NaN
1  {'x': 2}  NaN
2  {'x': 2}  NaN

In [22]: df.loc[0]=[{'y':3}]*2

In [23]: df
Out[23]: 
          a         b
0  {'y': 3}  {'y': 3}
1  {'x': 2}       NaN
2  {'x': 2}       NaN

while trying to assign to a aingle value as dic doesn't work.

In [25]: df.loc[0,'a']={'z':4}
ValueError: Incompatible indexer with Series

This creates confusion, @jreback, can you please consider it to be fixed in upcoming versions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

5 participants