Skip to content

Feature Request: Keep only these columns (vs. dropping all the ones you don't want) #14616

@jakesherman

Description

@jakesherman

Apologies if this has been submitted or considered in the past, I searched through the GitHub issues and couldn't find any information pertaining to this.

The idea is that instead of specifying all of the columns that you wish to delete from a DataFrame via the .drop method, you specify instead the columns you wish to keep through a .keep_cols method - all other columns are deleted. This would save typing in cases where there are many columns, and we only want to keep a small subset of columns. The prime use case here is method chaining, where using [[ doesn't really work in the middle of many methods being chained together.


A small, complete example of the issue

import pandas as pd

# Create an example DataFrame
data = [
    [1, 'ABC', 4, 10, 6.3],
    [2, 'BCD', 10, 9, 11.6],
    [3, 'CDE', 7, 4, 10.0],
    [4, 'DEF', 7, 10, 5.4],
    [5, 'EFG', 2, 9, 5.3],
]
data = pd.DataFrame(data, 
    columns = ['Id', 'Name', 'Rating1', 'Rating2', 'ThisIsANumber'])

# Just want columns Id and Ratings2
new_data = data.drop(['Name', 'Rating1', 'ThisIsANumber'], axis = 1)
new_data.head()

# ** It would be nice to be able to only specify the columns we want 
# ** to keep to save typing - similar to dplyr in R             

def keep_cols(DataFrame, keep_these):
    """Keep only the columns [keep_these] in a DataFrame, delete
    all other columns. 
    """
    drop_these = list(set(list(DataFrame)) - set(keep_these))
    return DataFrame.drop(drop_these, axis = 1)

new_data = data.pipe(keep_cols, ['Id', 'Rating2'])
new_data.head()

# In this specific example there was not much more typing between
# `.drop` and the `keep_cols` function, but often when a `DataFrame`
# has many columns this is not the case!

In this contrived example I created a keep_cols function as a rough draft of a .keep_columns method to the DataFrame object, and used the .pipe method to pipe that function to the DataFrame as if it were a method.

I don't think using [[ cuts if here. Yes, doing new_data[['Id', 'Rating2]] would work, but when method chaining, people often want to drop columns somewhere in the middle of a bunch of methods.

Just in case it's helpful, here's a good article demonstrating the power/beauty of method chaining in Pandas: https://tomaugspurger.github.io/modern-1.html.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions