This repository contains basic working examples to explore structured and unstructured data and/or big data using Python programming language.
If you load these files in mybinder.org, the following packages will already be included in the environment:
pandas/numpy
matplotlib
seaborn
bs4
opencv-python
Hence, it may take a few minutes for the page to load while the above packages are updated.
You can simply click on the below tab to load the codes online in binder:
After you have clicked on the above banner, it will take you to an online environment where you can run the codes that are provided in Jupyter notebooks. Within the Jupyter notebook, you can start with the top most cell and press Shift+Enter to move down to the next cell.
For learning purpose, the files should be used in the following order:
- OpenFiles.ipynb : shows how to access various sources of data like csv, PDF, Word, image, json, Stata, webpage etc.
- Clean.ipynb : shows how to clean and pre-process raw data into something that can be used for further analyses
- Disc_Visualize.ipynb: shows how to produce descriptive statistics and also explores data visualization
- Predictive.ipynb : shows how to use logistic regression and explores basic machine learning techniques