Skip to content

In this repository, I cover different essential topics that any data scientist needs to know and their implementation in Python.

Notifications You must be signed in to change notification settings

oscarm524/Python_Programming_for_Data_Science

Repository files navigation

Python Programming for Data Science

In this repository, I include several taks that any data scientist needs to know.

Reading Data

This notebook covers how to read three different data files into Python using pandas:

- Text file
- CSV file
- Excel file
- JSON file

Data Wrangling

This notebook covers the following tasks:

- Inner join
- Left join
- Right join
- Data stacking 
- Subsetting observations based on conditionals
- Replacing values
- Renaming data-frame columns
- Summary statistics of numerical variables
- Deleting columns from a data-frame
- Deleting rows from a data-frame
- Applying a function over all elements of a column
- Applying a function to groups 

Dates and Time

This notebook covers the following tasks:

- Converting strings to date-time object
- Defining time zones
- Subsetting observations based on dates
- Extraction of multiple features from dates
    - Year
    - Month
    - Day
    - Hour
    - Minute
- Difference between dates
- Creating lagged features
- Rolling window calculations

Handling Categorical Data

This notebook covers the followign tasks:

- Encoding nominal variables
- Encoding ordinal variables
- Encoding dictionaries

Numpy

Data Visualization

This notebook covers the following tasks using matplotlib:

- How to create a simple plot 
- Histogram
- Subplots
- Bar chart
- Grouped bar chart
- Stacked bar chart
- Stacked percentage bar chart
- Pie chart
- Box plot
- Scatter plot

About

In this repository, I cover different essential topics that any data scientist needs to know and their implementation in Python.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published