Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not working set_index with drop #13649

Open
VelizarVESSELINOV opened this issue Jul 14, 2016 · 6 comments
Open

Not working set_index with drop #13649

VelizarVESSELINOV opened this issue Jul 14, 2016 · 6 comments
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@VelizarVESSELINOV
Copy link

Code Sample, a copy-pastable example if possible

from io import StringIO
from pandas import read_csv

dtf = read_csv(StringIO("DATE_TIME,A\n2/8/2015  6:00:30,1"))

print(dtf)

dtf.set_index(dtf.DATE_TIME, drop=True, inplace=True)
print(dtf.columns)
print(dtf)

Current output

           DATE_TIME  A
0  2/8/2015  6:00:30  1
Index(['DATE_TIME', 'A'], dtype='object')
                           DATE_TIME  A
DATE_TIME                              
2/8/2015  6:00:30  2/8/2015  6:00:30  1

Expected Output

           DATE_TIME  A
0  2/8/2015  6:00:30  1
Index(['A'], dtype='object')
                           A
DATE_TIME                              
2/8/2015  6:00:30  1

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Darwin
OS-release: 15.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 20.6.7
Cython: None
numpy: 1.11.1
scipy: 0.16.1
statsmodels: None
xarray: None
IPython: 4.0.1
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.6
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.0
openpyxl: 2.3.5
xlrd: 1.0.0
xlwt: 1.0.0
xlsxwriter: None
lxml: None
bs4: 4.4.1
html5lib: None
httplib2: 0.9.2
apiclient: 1.5.0
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None
None
@sinhrks sinhrks added the Bug label Jul 14, 2016
@sinhrks
Copy link
Member

sinhrks commented Jul 14, 2016

thx, it looks to be a bug. if input is a Series sliced from original, corresponding column should be dropped.

works fine if we pass column name.

dtf.set_index('DATE_TIME', drop=True, inplace=True)
dtf.columns
# Index(['A'], dtype='object')

@jreback
Copy link
Contributor

jreback commented Jul 14, 2016

not a bug - this violates the guarantees of set_index

it's not valid to pass an actual column here -

its not the same as actually assigning the index

@jreback
Copy link
Contributor

jreback commented Jul 14, 2016

there is a PR where try to make this work - but it's inherently ambiguous

not even sure you could warn about this
(though it IS an error to use inplace and drop I think)

@michaelaye
Copy link
Contributor

michaelaye commented Oct 12, 2016

not a bug - this violates the guarantees of set_index

Could you elaborate what guarantee that is of set_index? I find it confusing if I specifically use drop=True and get no error when for some reason dropping is not allowed or possible.

@jreback
Copy link
Contributor

jreback commented Oct 12, 2016

@michaelaye

when you pass a list for the keys, it is by-definition setting the index. However, one possibly could think that [58] is the actual result of [57].

In [55]: df = pd.DataFrame({'A':range(2),'B':range(2),'C':range(2)})

In [56]: df
Out[56]: 
   A  B  C
0  0  0  0
1  1  1  1

In [57]: df.set_index(['A','B'])
Out[57]: 
     C
A B   
0 0  0
1 1  1

In [58]: df.index=['A','B']

In [59]: df
Out[59]: 
   A  B  C
A  0  0  0
B  1  1  1
In [54]: DataFrame.set_index?
Signature: DataFrame.set_index(self, keys, drop=True, append=False, inplace=False, verify_integrity=False)
Docstring:
Set the DataFrame index (row labels) using one or more existing
columns. By default yields a new object.

Parameters
----------
keys : column label or list of column labels / arrays
drop : boolean, default True
    Delete columns to be used as the new index
append : boolean, default False
    Whether to append columns to existing index
inplace : boolean, default False
    Modify the DataFrame in place (do not create a new object)
verify_integrity : boolean, default False
    Check the new index for duplicates. Otherwise defer the check until
    necessary. Setting to False will improve the performance of this
    method

Examples
--------
>>> indexed_df = df.set_index(['A', 'B'])
>>> indexed_df2 = df.set_index(['A', [0, 1, 2, 0, 1, 2]])
>>> indexed_df3 = df.set_index([[0, 1, 2, 0, 1, 2]])

Returns
-------
dataframe : DataFrame

@ron819
Copy link

ron819 commented Nov 27, 2018

any plans to fix this?

@simonjayhawkins simonjayhawkins added the Error Reporting Incorrect or improved errors from pandas label Apr 24, 2020
@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone Apr 24, 2020
@simonjayhawkins simonjayhawkins added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Apr 24, 2020
@mroeschke mroeschke added Indexing Related to indexing on series/frames, not to indexes themselves and removed Error Reporting Incorrect or improved errors from pandas Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels May 1, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

7 participants