Skip to content

chuphalashit19/Machine-Learning-Basics

Repository files navigation

Machine-Learning-Basics

Here i have worked on some of basics of ML and their different techniques . Also here i have predicted few data using a historical data set.

Linear-Regression-ML-

Here we doing an assumption using a data set of car emission We assume co2 emission for a new car using the historical data. We use the Machine Learning technique called Simple Linear Regression to predict the data. Also i make use of scikit learn to do some mathematical equations. Go through code set by set to understand.

Multiple-Regression-ML-

Now we make use of multiple attributes in a table to predict Co2 Emission for a new vehicle Here we doing an assumption using a data set of car emission. We assume co2 emission for a new car using the historical data. We use the Machine Learning technique called Multiple Linear Regression to predict the data. Also i make use of scikit learn to do some mathematical equations. Go through code set by set to understand.

Polynomial-Regression-ML-

This is similar to linear regression but contains a curve line. So we make use of polynomial equation to solve this process. Go through the code and graph to understand better.

Non-Linear-Regression-ML-

Non-linear regressions are a relationship between independent variables 𝑥 and a dependent variable 𝑦 which result in a non-linear function modeled data. If the data shows a curvy trend, then linear regression will not produce very accurate results when compared to a non-linear regression because, as the name implies, linear regression presumes that the data is linear. In this notebook, we fit a non-linear model to the datapoints corrensponding to China's GDP from 1960 to 2014.

K-Nearest-Neighbors-Classifier

K-Nearest Neighbors is an algorithm for supervised learning. Where the data is 'trained' with data points corresponding to their classification. Once a point is to be predicted, it takes into account the 'K' nearest points to it to determine it's classification. Here we have telecommunications provider as segmented its customer base by service usage patterns, categorizing the customers into four groups. The target field, called custcat, has four possible values that correspond to the four customer groups, as follows: 1- Basic Service 2- E-Service 3- Plus Service 4- Total Service Our objective is to build a classifier, to predict the class of unknown cases. We will use a specific type of classification called K nearest neighbour.

Decision-Tree-Classifier

Here we use on of the popular ML algorithm knows as decision tree. We will use this classification algorithm to build a model from historical data of patients, and their response to different medications. Then we use the trained decision tree to predict the class of a unknown patient, or to find a proper drug for a new patient. Here we have collected data about a set of patients, all of whom suffered from the same illness. During their course of treatment, each patient responded to one of 5 medications, Drug A, Drug B, Drug c, Drug x and y. Our job is to build a model to find out which drug might be appropriate for a future patient with the same illness. The feature sets of this dataset are Age, Sex, Blood Pressure, and Cholesterol of patients, and the target is the drug that each patient responded to. It is a sample of binary classifier, and you can use the training part of the dataset to build a decision tree, and then use it to predict the class of a unknown patient, or to prescribe it to a new patient.

Logistic-Regression-with-python

We create a model for a telecommunication company, to predict when its customers will leave for a competitor, so that they can take some action to retain the customers. In order to estimate the class of a data point, we need some sort of guidance on what would be the most probable class for that data point. For this, we use Logistic Regression. Logistic Regression is a variation of Linear Regression, useful when the observed dependent variable, y, is categorical. It produces a formula that predicts the probability of the class label as a function of the independent variables. Logistic regression fits a special s-shaped curve by taking the linear regression and transforming the numeric estimate into a probability with the following function, which is called sigmoid function 𝜎.

Support-Vector-Machines(SVM)

We use SVM (Support Vector Machines) to build and train a model using human cell records, and classify cells to whether the samples are benign or malignant. SVM works by mapping data to a high-dimensional feature space so that data points can be categorized, even when the data are not otherwise linearly separable. A separator between the categories is found, then the data is transformed in such a way that the separator could be drawn as a hyperplane. Following this, characteristics of new data can be used to predict the group to which a new record should belong.

k-means-Clustering

The K-means is vastly used for clustering in many data science applications, especially useful if you need to quickly discover insights from unlabeled data. We will learn how to use k-Means for customer segmentation. Some real-world applications of k-means: Customer segmentation Understand what the visitors of a website are trying to accomplish Pattern recognition Machine learning Data compression We use this method for : k-means on a random generated dataset Using k-means for customer segmentation k-means will partition your customers into mutually exclusive groups, for example, into 3 clusters. The customers in each cluster are similar to each other demographically. Now we can create a profile for each group, considering the common characteristics of each cluster. For example, the 3 clusters can be:

->AFFLUENT, EDUCATED AND OLD AGED ->MIDDLE AGED AND MIDDLE INCOME ->YOUNG AND LOW INCOME

Hierarchical-Clustering

Here we have a clustering technique, which is Agglomerative Hierarchical Clustering. The agglomerative is the bottom up approach. We are making us of Agglomerative clustering, which is more popular than Divisive clustering. The Agglomerative Clustering class will require two inputs:

  1. n_clusters
  2. linkage Firstly we work with random dataset Then, we work with the cars_cluss dataset Go through the code bit by bit to understand better

Density-Based-Clustering-DBSCAN

Density-based Clustering locates regions of high density that are separated from one another by regions of low density. Density is defined as the number of points within a specified radius. DBSCAN is specially very good for tasks like class identification on a spatial context. The wonderful attribute of DBSCAN algorithm is that it can find out any arbitrary shape cluster without getting affected by noise

Recommendation-Systems :-

Content-based-Filtering

Recommendation systems are a collection of algorithms used to recommend items to users based on information taken from the user. These systems have become ubiquitous, and can be commonly seen in online stores, movies databases and job finders. We will explore Content-based recommendation systems and implement a simple version of one using Python and the Pandas library. This technique attempts to figure out what a user's favourite aspects of an item is, and then recommends items that present those aspects. In our case, we're going to try to figure out the input's favorite genres from the movies and ratings given. The output given shows the top 20 movies the user would like to watch based on his profile.

Collaborative-Filtering

We will explore recommendation systems based on Collaborative Filtering and implement simple version of one using Python and the Pandas library. The first technique we're going to take a look at is called Collaborative Filtering, which is also known as User-User Filtering. This technique uses other users to recommend items to the input user. It attempts to find users that have similar preferences and opinions as the input and then recommends items that they have liked to the input. There are several methods of finding similar users (Even some making use of Machine Learning), and the one we will be using here is going to be based on the Pearson Correlation Function.