Skip to content

Sandy0002/Machine-Learning-Exercises

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine Learning Portfolio

Repository consists of programs that demonstrate supervised and unsupervised learning's implementation. Problems ranging from linear regression to density based clustering are present in this repository.The repository is divided as Classification, Regression and Clustering based on the tasks that are carried out. In each of the folders various algorithms are used.

Contents

About

Machine learning is a subfield of artificial intelligence (AI) that focuses on the development of algorithms and models that allow computers to learn from data and make predictions or decisions without being explicitly programmed. The primary goal of machine learning is to enable computers to learn from experience and improve their performance over time. They are generally used for problems where we need to predict or estimate something from the data.

Applications

There are a wide range of applications of deep learning few are mentioned below:

  • Email Spam Detection
  • Credit Card Fraud Detection
  • Medical Diagnosis
  • Recommender Systems
  • Sentiment Analysis

Terms

To get started with the deep learning we need to have knowledege about several terms used.The terms and their meaning are described below:

TERMS MEANING
Feature An individual input variable used in the training of a machine learning model.
Model A mathematical representation that maps input features to output predictions.
Hyperparameters Parameters that are set before training a machine learning model and control the learning process, such as the learning rate or the number of hidden layers in a neural network.
Feature Engineering The process of selecting, transforming, and creating relevant features to improve the performance of a machine learning model.
Loss Function A function that measures the difference between predicted and actual values, used to guide the training process.
Cross-Validation : A technique used to assess the performance of a model by dividing the data into subsets for training and testing.
Training Set The part of the data used to train a machine learning model.
Test Set The part of the data used to evaluate the performance of a trained machine learning model on unseen data.
Validation Set An independent dataset used to tune hyperparameters and prevent overfitting during model training.
Overfitting A situation where a machine learning model performs well on the training data but poorly on unseen data.
Underfitting A situation where a machine learning model fails to capture the underlying patterns in the data.
Gradient Descent An optimization algorithm used to update the model's parameters and minimize the loss function.
Ensemble Learning A method that combines multiple models to improve prediction accuracy and generalization.
Bias Difference between the actual value and the value predicted by the model.
Variance The amount by which the prediction changes upon changing the training set.
Precision It is a value given by ratio of true positives to predicted positives.
Recall It is a value given by ratio of true positives to actual positives.

Tasks

TERMS MEANING
Clustering A technique used to group similar data points together based on their similarity.
Classification A task in supervised learning where the model predicts a categorical label or class.
Regression A task in supervised learning where the model predicts a continuous numerical value.

Learning Type

TERMS MEANING
Supervised Learning A type of machine learning where the model is trained on labeled data, i.e., input-output pairs, and learns to make predictions on new, unseen data.
Unsupervised Learning A type of machine learning where the model is trained on unlabeled data and learns patterns and structures from the data without explicit output labels.
Semi-Supervised Learning A combination of supervised and unsupervised learning, where the model is trained on a combination of labeled and unlabeled data.
Reinforcement Learning A type of machine learning where an agent learns to make decisions by interacting with an environment and receiving feedback in the form of rewards or penalties.

Steps Involved

Step Task
Data Collection The first step is to gather relevant data for the problem you want to solve. Data can come from various sources, such as databases, APIs, or manual data entry.
Data Preprocessing Once you have collected the data, it needs to be cleaned and prepared for analysis. This step involves handling missing values, dealing with outliers, and converting data into a suitable format for machine learning algorithms.
Feature Engineering Feature engineering is the process of selecting, transforming, or creating new features from the existing data to improve the performance of machine learning models.
Data Splitting Before training a machine learning model, the dataset is divided into training and testing sets. The training set is used to train the model, while the testing set is used to evaluate its performance on unseen data.
Model Selection Depending on the problem type (e.g., classification or regression) and data characteristics, you choose an appropriate machine learning algorithm to train the model.
Model Training During this step, the selected machine learning algorithm is applied to the training data to learn patterns and relationships in the data.
Model Evaluation After training the model, it is evaluated on the testing data to assess its performance. Various metrics, such as accuracy, precision, recall, and mean squared error, are used to evaluate the model's performance.
Hyperparameter Tuning Many machine learning algorithms have hyperparameters that control the learning process. Hyperparameter tuning involves searching for the best combination of hyperparameters to optimize the model's performance.
Model Deployment Once you have a trained and tuned model, it can be deployed to make predictions on new, unseen data.
Model Monitoring and Maintenance Machine learning models may require periodic monitoring and maintenance to ensure they continue to perform well as new data becomes available.
Interpretation and Visualization Understanding how the model makes predictions is essential for building trust and gaining insights from the model's results. Interpretation and visualization techniques help explain the model's behavior and decision-making process.
Iteration and Improvement Machine learning is an iterative process. After deploying the model, you may receive new data and feedback, leading to further improvements and updates to the model.

Libraries Used

  • Numpy : Used for numerical computations in python
  • Pandas : Used for file reading and other operations when working with large data.
  • Sklearn : This is a machine learning library for python.
  • Matplotlib : Visualization library
  • Seaborn : Interactive visualizations are made using these library.

Datasets Description

The datasets used for these program are downloaded from kaggle. Datasets can be found here.

Since there are lot of programs so the datasets are used and tasks carried out using them are covered here.

Datasets Description
Coffee Dataset This dataset consists of information about coffee and its types. Here various tasks have been carried out in the programs such as finding the category of the coffee and amount of acidity in the coffee.
Fuel Consumption Consists information about the vehicle,fuel,fuelconsumption etc. Programs such as estimating the class of the vehicle and fuel consumption are written.
Garments Worker Productivity The data consists of various columns consisting of dates, quarters,departments, productivity details etc. The programs to find out the productivity of the workers and department of the workers are there.
Insurance This is a dataset that provides information about people and the insurance amount that are paid by them. The program estimating the premium to be paid by a person is there.
Iphone Prices This dataset cosists data about Iphone such as its variant,lauch year,description and price of the variant. Program for estimating Iphone price is there.
Loan Sanction This dataset consists of informtion about loan such as loan id, age of the person, gender, loan amount, loan status etc.Loan is used in various aspects.Hence programs for whether a person will get a loan or not and if gets then what will be the amount programs for such tasks are there in the repository.
Possum Dataset This is an animal whose details are present in the dataset.Tasks such finding gender of the possum have been carried out.
Telecom Dataset A very popular problem whether a person will churn or not. For determining status this dataset have been used.
Titanic Dataset This dataset consists of 1912 famous tragedy the sinkage of Titanic data such as passenger id, class of the ticket, gender, survival status etc. Program for survival status is present in the repository.
Tobacco Dataset This is dataset consists of information about the age of beginning of tobacco,bidi and cigratte of the children across various states of India. Tasks to estimate the average age of starting of tobacco and bidi have been carried out.
Wine Quality This dataset contains data about red wine such as acidity,sugar content,chlorides, density,pH,quality etc. Programs for estimating the acidity and quality of the wine is present in the repository.

LICENSE

MIT LICENSE