pandas methods
Daisho Komiyama edited this page Jan 7, 2023
·
6 revisions
import pandas as pd
pd.read_csv(file_path)
# drop single column
data.drop('car name', axis='columns')
# same as data.drop('car name', axis=1), 1 == 'columns'
# drop multiple columns
x = data.drop(columns = {'mpg','origin_europe'})
axis=1
== axis='columns'
axis=0
== axis='index'
We create 3 simple true or false columns with titles equivalent to "Is this car America?", "Is this car European?" and "Is this car Asian?". These will be used as independent variables without imposing any type of ordering between the three regions.
data = pd.get_dummies(data, columns=['origin'])
Above function changes values of origin
column from this,
model year origin
0 70 america
1 70 asia
2 70 america
to this; simple True
or False
type value.
model year origin_america origin_asia origin_europe
0 70 1 0 0
1 70 0 1 0
2 70 1 0 0
hpIsDigit = pd.DataFrame(data.horsepower.str.isdigit())
Print out hpIsDigit
(type DataFrame)
print(hpIsDigit.to_string())
horsepower
0 True
1 True
2 True
3 True
4 True
5 False
6 True
7 True
So item of index 5 (False
) is a non-digit value.
data[hpIsDigit['horsepower'] == False]
mpg cylinders displacement horsepower weight acceleration \
32 25.0 4 98.0 ? 2046 19.0
126 21.0 6 200.0 ? 2875 17.0
330 40.9 4 85.0 ? 1835 17.3
336 23.6 4 140.0 ? 2905 14.3
354 34.5 4 100.0 ? 2320 15.8
374 23.0 4 151.0 ? 3035 20.5
In this case, replacing '?'
with NaN
data = data.replace('?', np.nan)
medianFiller = lambda x: x.fillna(x.median())
data = data.apply(medianFiller, axis=0)