- Numpy
- Scipy
- Pandas
- Matplotlib
- Scikit-learn
- Jupyter
This is how I generally try to approach any data analysis task.
- Define
- Set clear objectives
- Import
- Support various data types
- Explore raw data
- Check for any inconsistency or corruption
- Clean
- Preprocess
- Filter
- Offset
- Exclude outliers
- Analyze
- Postprocess
- Derive custom metrics
- Aggregate multiple values into one
- Calculate common statistics that may shed insight
- Export
- Save a clean dataset
- Report
- Re-iterate process if needed
- Generate deliverables
Workflow inspired by "Using Python for Data Analysis" by Ian Eyre from Real Python, retrieved on May 2024, https://realpython.com/python-for-data-analysis/
Data sourced from Washington State Department of Licensing https://www.kaggle.com/datasets/sahirmaharajj/electric-vehicle-population
- Descriptive analysis: describe the past
- Diagnostic analysis: investigate the why
- Predictive analysis: try to predict the future
- Prescriptive analysis: plan a strategy