Skip to content

SamBelkacem/Machine-Learning-Basics

Repository files navigation

Machine Learning Process

The machine learning process involves the following steps:

  • 1- Data Preparation: Collect, clean, and preprocess data.
  • 2- Data Visualization and Analysis: Visualize and analyze data to identify patterns and relationships.
  • 3- Feature Engineering: Select and transform relevant variables in the data.
  • 4- Model Selection: Choose the best model for the problem.
  • 5- Model Training: Feed data into the model and adjust parameters to minimize error.
  • 6- Hyperparameter Tuning: Set hyperparameters to optimize model performance.
  • 7- Model Evaluation: Measure accuracy, precision, recall, and other performance metrics.
  • 8- Model Deployment: Integrate the model into an application and set up a pipeline to feed new data.

Machine Learning Tutorial

This tutorial covers Machine Learning Basics using Python.

The repository includes Python notebooks, reference guides, and cheatsheets for the entire Machine Learning process:

  • 1- Data preprocessing and analysis: clean and transform data into a format suitable for analysis using NumPy and Pandas.
  • 2- Data visualization: understand and explore data visually using Matplotlib and Seaborn.
  • 3- Machine learning: explore various algorithms in Scikit-learn such as regression, classification, and clustering.
  • 4- Feature engineering: feature encoding, feature scaling, feature selection, etc.
  • 5- Model selection: comparison of ML algorithms, how to choose a ML algorithm, etc.
  • 6- Hyperparameters tuning: Grid Search, Random Search, and Bayesian Optimization.
  • 7- Model evaluation: validation methods, evaluation metrics, etc.
  • 8- Model explainability: feature importance, interpretable models, etc.

The repository also includes two Python notebooks of two popular examples to get started with Machine Learning:

  • Classification - Titanic Survival Prediction: Predict whether a passenger on the Titanic ship survived or not based on various features such as their age, gender, ticket class, and cabin location (notebook).
  • Regression - Boston House Price Prediction: Predict the median value of houses in Boston neighborhoods based on various features such as crime rate, number of rooms, proximity to employment centers, and accessibility to highways (notebook).

The end of the GitHub repository provides resources and links to practice and advance with Machine Learning:

  • The most popular ML dataset platforms.
  • The most popular ML competition platforms.
  • A guide to tackle ML competitions (PDF).

Requirements

Tools:

  • Python 3
  • Jupyter Notebook: web-based interactive computing platform
  • Google Colab: cloud-based Jupyter Notebook environment

Concepts:

Python libraries:

  • NumPy: A library for efficient numerical operations and multidimensional arrays, widely used in scientific computing and data analysis.
  • Pandas: A data manipulation and analysis library, providing data structures and functions to easily handle and process structured data.
  • Matplotlib: A popular plotting library used for creating static, animated, and interactive visualizations.
  • Seaborn: A data visualization library based on Matplotlib, providing high-level functions for creating attractive statistical graphics.
  • Scikit-learn: A data analysis and modeling library, including ML algorithms for various tasks: classification, regression, clustering, etc.

Structure of the tutorial

  • 1-   Machine learning basic concepts
  • 2-   Read input data in Python
  • 3-   Data preprocessing and analysis: Numpy and Pandas
  • 4-   Data visualization: Matplotlib and Seaborn
  • 5-   Machine learning: Scikit-learn
  • 6-   Feature engineering
  • 7-   Model selection and parameter tuning
  • 8-   Model evaluation and explainability
  • 9-   Practice: Machine learning datasets
  • 10- Practice: Machine learning competitions

Content of the tutorial

1- Machine learning basic concepts

  • Presentation on Machine learning basic concepts (PDF)

2- Read input data in Python

  • Tutorial to read various sources in a DataFrame (notebook)

3- Data preprocessing and analysis: Numpy and Pandas

4- Data visualization: Matplotlib and Seaborn

  • Chart chooser (PDF)
  • Matplotlib cheatsheet (PDF)
  • Matplotlib tutorial (WEB)
  • Seaborn tutorial (WEB)
  • Data visualization tutorial (notebook)

5- Machine learning: Scikit-learn

  • Machine learning map (PDF)
  • Scikit-learn cheatsheet (PDF)
  • Scikit-learn tutorial (notebook)
  • Machine learning tutorial (notebook)
  • Classification: Titanic Survival Prediction (notebook)
  • Regression: Boston House Price Prediction (notebook)

6- Feature engineering

  • Data cleaning guide (PDF)
  • Data preparation cheatsheet (PDF)
  • Feature engineering (PDF)
  • Feature engineering tutorial (notebook)
  • Feature selection methods (IMG)

7- Model selection and parameter tuning

  • Comparison of ML algorithms 1 (PDF)
  • Comparison of ML algorithms 2 (IMG)
  • Comparison of ML algorithms 3 (IMG)
  • How to choose a ML algorithm (IMG)
  • Hyperparameter tuning (WEB)

8- Model evaluation and explainability

  • Evaluation metrics cheatsheet (PDF)
  • Evaluation metrics in Python (WEB)
  • Model explainability cheatsheet (PDF)

9- Practice: Machine learning datasets

10- Practice: Machine learning competitions