In this repository, I include several taks that any data scientist needs to know.
This notebook covers how to read three different data files into Python using pandas:
- Text file
- CSV file
- Excel file
- JSON file
This notebook covers the following tasks:
- Inner join
- Left join
- Right join
- Data stacking
- Subsetting observations based on conditionals
- Replacing values
- Renaming data-frame columns
- Summary statistics of numerical variables
- Deleting columns from a data-frame
- Deleting rows from a data-frame
- Applying a function over all elements of a column
- Applying a function to groups
This notebook covers the following tasks:
- Converting strings to date-time object
- Defining time zones
- Subsetting observations based on dates
- Extraction of multiple features from dates
- Year
- Month
- Day
- Hour
- Minute
- Difference between dates
- Creating lagged features
- Rolling window calculations
This notebook covers the followign tasks:
- Encoding nominal variables
- Encoding ordinal variables
- Encoding dictionaries
This notebook covers the following tasks using matplotlib:
- How to create a simple plot
- Histogram
- Subplots
- Bar chart
- Grouped bar chart
- Stacked bar chart
- Stacked percentage bar chart
- Pie chart
- Box plot
- Scatter plot