New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.loc[...] = value returns SettingWithCopyWarning #17476
Comments
@NadiaRom Can you provide a full example? It's hard to say for sure, but I suspect that In [8]: df = pd.DataFrame({"A": [1, 2], "B": [3, 4], "C": [4, 5]})
In [9]: df1 = df[['A', 'B']]
In [10]: df1.loc[0, 'A'] = 5
/Users/taugspurger/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/indexing.py:180: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self._setitem_with_indexer(indexer, value)
/Users/taugspurger/Envs/pandas-dev/bin/ipython:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
#!/Users/taugspurger/Envs/pandas-dev/bin/python3.6 So we're updating |
@TomAugspurger Here is the code, in general, I never assign values to pandas without .loc df = pd.read_csv('df_unicities.tsv', sep='\t')
df.replace({'|': '--'}, inplace=True)
df_c = df.loc[df.encountry == country, : ]
df_c['sort'] = (df_c.encities_ua == 'all').astype(int) # new column
df_c['sort'] += (df_c.encities_foreign == 'all').astype(int)
df_c.sort_values(by='sort', inplace=True)
# ---end of chunk, everything is fine ---
if df_c.encities_foreign.str.contains('all').sum() < len(df_c):
df_c.loc[df_c.encities_foreign.str.contains('all'), 'encities_foreign'] = 'other'
df_c.loc[df_c.cities_foreign.str.contains('всі'), 'cities_foreign'] = 'інші'
else:
df_c.loc[df_c.encities_foreign.str.contains('all'), 'encities_foreign'] = country
df_c.loc[df_c.cities_foreign.str.contains('всі'), 'cities_foreign'] = df_c.country.iloc[0]
if df_c.encities_ua.str.contains('all').sum() < len(df_c):
df_c.loc[df_c.encities_ua.str.contains('all'), 'encities_ua'] = 'other'
df_c.loc[df_c.cities_ua.str.contains('всі'), 'cities_ua'] = 'інші'
else:
df_c.loc[df_c.encities_ua.str.contains('all'), 'encities_ua'] = 'Ukraine'
df_c.loc[df_c.cities_ua.str.contains('всі'), 'cities_ua'] = 'Україна'
# Warning after it Thank you for rapid answer! |
The issue here is that you're slicing you dataframe first with
Pandas isn't 100% sure if you want to assign values to just your
Doing this will fix your error. I'll tack on a brief example to help explain the above since I've noticed a lot of users get confused by pandas in this aspect. Example with made up data
So the above works as we expect! Now lets try an example that mirrors what you attempted to do with your data.
Looks like we hit the same error! But it changed Lets start back from
This works without an error because we've told pandas that If you in fact do want these changes to |
@CRiddler Great, thank you! |
documentation is here http://pandas.pydata.org/pandas-docs/stable/indexing.html#returning-a-view-versus-a-copy and @CRiddler has a nice expl. you should in general NOT use |
@CRiddler Thanks your answer is better than the ones in Stack Overflow could you add when you want to propagate to the initial dataframe or give an indication of how it is done? |
@persep In general I don't like turning issues into stackoverflow threads for help, but it seems that this issue has gotten a fair bit of attention since last posting so I'll go ahead and post my method of tackling this type of problem in pandas. I typically do this by not subsetting the dataframe into separate variables, but I instead turn masks into variables- then combine masks as needed and set values based on those masks to ensure the changes happen in the original dataframe, and not to some copy floating around. Original data:
Remember that creating a temporary dataframe will NOT propagate changes
To my knowledge, there is no way to use the same code as above and force changes to propagate back to the original dataframe. However, if we change our thinking a bit and work with masks instead of full-on subsets we can achieve the desired result. While this isn't necessarily "propagating" changes to the original dataframe from a subset, we are ensuring that any changes we do make happen in the original dataframe
Lastly, if we ever wanted to see what df_q would look like we can always subset it from the original dataframe using our
While this isn't necessarily "propagating" changes from |
@CRiddler Thanks, you've been very helpful |
The first thing you should understand is that SettingWithCopyWarning is a warning, and not an error. You can safely disable this warning with the following assignment.
The real problem behind the warning is that it is generally difficult to predict whether a view or a copy is returned. When filtering Pandas DataFrames , it is possible slice/index a frame to return either a view or a copy. A "View" is a view of the original data, so modifying the view may modify the original data. While, a "Copy" is a replication of data from the original, any changes made to the copy will not affect original data, and any changes made to the original data will not affect the copy. |
@CRiddler thanks for the detailed explanation. What happens if the original dataframe is out of scope? I.e. def update_values(filtered):
# Filtered is the result of a 'loc' call
new_value = result_from_function_body()
set_indexes = some_computation()
filtered.loc[set_indexes, 'new_col'] = new_value Does this mean there is no way for |
Code Sample
Problem description
This code in Pandas 20.3 throws SettingWithCopyWarning and suggests to
"Try using
.loc[row_indexer,col_indexer] = value
instead".I am already doing so, looks like there is a little bug. I use Jupyter.
Thank you! :)
Output of
pd.show_versions()
commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 8.1
machine: AMD64
processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.20.1
pytest: 3.0.7
pip: 9.0.1
setuptools: 35.0.2
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: