Machine-Learning-Basics

Here i have worked on some of basics of ML and their different techniques . Also here i have predicted few data using a historical data set.

Linear-Regression-ML-

Here we doing an assumption using a data set of car emission We assume co2 emission for a new car using the historical data. We use the Machine Learning technique called Simple Linear Regression to predict the data. Also i make use of scikit learn to do some mathematical equations. Go through code set by set to understand.

Multiple-Regression-ML-

Now we make use of multiple attributes in a table to predict Co2 Emission for a new vehicle Here we doing an assumption using a data set of car emission. We assume co2 emission for a new car using the historical data. We use the Machine Learning technique called Multiple Linear Regression to predict the data. Also i make use of scikit learn to do some mathematical equations. Go through code set by set to understand.

Polynomial-Regression-ML-

This is similar to linear regression but contains a curve line. So we make use of polynomial equation to solve this process. Go through the code and graph to understand better.

Non-Linear-Regression-ML-

Non-linear regressions are a relationship between independent variables 𝑥 and a dependent variable 𝑦 which result in a non-linear function modeled data. If the data shows a curvy trend, then linear regression will not produce very accurate results when compared to a non-linear regression because, as the name implies, linear regression presumes that the data is linear. In this notebook, we fit a non-linear model to the datapoints corrensponding to China's GDP from 1960 to 2014.

K-Nearest-Neighbors-Classifier

K-Nearest Neighbors is an algorithm for supervised learning. Where the data is 'trained' with data points corresponding to their classification. Once a point is to be predicted, it takes into account the 'K' nearest points to it to determine it's classification. Here we have telecommunications provider as segmented its customer base by service usage patterns, categorizing the customers into four groups. The target field, called custcat, has four possible values that correspond to the four customer groups, as follows: 1- Basic Service 2- E-Service 3- Plus Service 4- Total Service Our objective is to build a classifier, to predict the class of unknown cases. We will use a specific type of classification called K nearest neighbour.

Decision-Tree-Classifier

Here we use on of the popular ML algorithm knows as decision tree. We will use this classification algorithm to build a model from historical data of patients, and their response to different medications. Then we use the trained decision tree to predict the class of a unknown patient, or to find a proper drug for a new patient. Here we have collected data about a set of patients, all of whom suffered from the same illness. During their course of treatment, each patient responded to one of 5 medications, Drug A, Drug B, Drug c, Drug x and y. Our job is to build a model to find out which drug might be appropriate for a future patient with the same illness. The feature sets of this dataset are Age, Sex, Blood Pressure, and Cholesterol of patients, and the target is the drug that each patient responded to. It is a sample of binary classifier, and you can use the training part of the dataset to build a decision tree, and then use it to predict the class of a unknown patient, or to prescribe it to a new patient.

Logistic-Regression-with-python

We create a model for a telecommunication company, to predict when its customers will leave for a competitor, so that they can take some action to retain the customers. In order to estimate the class of a data point, we need some sort of guidance on what would be the most probable class for that data point. For this, we use Logistic Regression. Logistic Regression is a variation of Linear Regression, useful when the observed dependent variable, y, is categorical. It produces a formula that predicts the probability of the class label as a function of the independent variables. Logistic regression fits a special s-shaped curve by taking the linear regression and transforming the numeric estimate into a probability with the following function, which is called sigmoid function 𝜎.

Support-Vector-Machines(SVM)

We use SVM (Support Vector Machines) to build and train a model using human cell records, and classify cells to whether the samples are benign or malignant. SVM works by mapping data to a high-dimensional feature space so that data points can be categorized, even when the data are not otherwise linearly separable. A separator between the categories is found, then the data is transformed in such a way that the separator could be drawn as a hyperplane. Following this, characteristics of new data can be used to predict the group to which a new record should belong.

k-means-Clustering

The K-means is vastly used for clustering in many data science applications, especially useful if you need to quickly discover insights from unlabeled data. We will learn how to use k-Means for customer segmentation. Some real-world applications of k-means: Customer segmentation Understand what the visitors of a website are trying to accomplish Pattern recognition Machine learning Data compression We use this method for : k-means on a random generated dataset Using k-means for customer segmentation k-means will partition your customers into mutually exclusive groups, for example, into 3 clusters. The customers in each cluster are similar to each other demographically. Now we can create a profile for each group, considering the common characteristics of each cluster. For example, the 3 clusters can be:

->AFFLUENT, EDUCATED AND OLD AGED ->MIDDLE AGED AND MIDDLE INCOME ->YOUNG AND LOW INCOME

Hierarchical-Clustering

Here we have a clustering technique, which is Agglomerative Hierarchical Clustering. The agglomerative is the bottom up approach. We are making us of Agglomerative clustering, which is more popular than Divisive clustering. The Agglomerative Clustering class will require two inputs:

n_clusters
linkage Firstly we work with random dataset Then, we work with the cars_cluss dataset Go through the code bit by bit to understand better

Density-Based-Clustering-DBSCAN

Density-based Clustering locates regions of high density that are separated from one another by regions of low density. Density is defined as the number of points within a specified radius. DBSCAN is specially very good for tasks like class identification on a spatial context. The wonderful attribute of DBSCAN algorithm is that it can find out any arbitrary shape cluster without getting affected by noise

Recommendation-Systems :-

Content-based-Filtering

Recommendation systems are a collection of algorithms used to recommend items to users based on information taken from the user. These systems have become ubiquitous, and can be commonly seen in online stores, movies databases and job finders. We will explore Content-based recommendation systems and implement a simple version of one using Python and the Pandas library. This technique attempts to figure out what a user's favourite aspects of an item is, and then recommends items that present those aspects. In our case, we're going to try to figure out the input's favorite genres from the movies and ratings given. The output given shows the top 20 movies the user would like to watch based on his profile.

Collaborative-Filtering

We will explore recommendation systems based on Collaborative Filtering and implement simple version of one using Python and the Pandas library. The first technique we're going to take a look at is called Collaborative Filtering, which is also known as User-User Filtering. This technique uses other users to recommend items to the input user. It attempts to find users that have similar preferences and opinions as the input and then recommends items that they have liked to the input. There are several methods of finding similar users (Even some making use of Machine Learning), and the one we will be using here is going to be based on the Pearson Correlation Function.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
ChurnData.csv		ChurnData.csv
Collaborative Filtering.ipynb		Collaborative Filtering.ipynb
Content-Based Filtering.ipynb		Content-Based Filtering.ipynb
Cust_Segmentation.csv		Cust_Segmentation.csv
DBSCAN Clustering.ipynb		DBSCAN Clustering.ipynb
Decision Tree Classifier.ipynb		Decision Tree Classifier.ipynb
FuelConsumption.csv		FuelConsumption.csv
Hierarchical Clustering (Agglomerative clustering).ipynb		Hierarchical Clustering (Agglomerative clustering).ipynb
K-Means Clustering.ipynb		K-Means Clustering.ipynb
K-Neighbors Classifier.ipynb		K-Neighbors Classifier.ipynb
Linear Regression(CO2 emission).ipynb		Linear Regression(CO2 emission).ipynb
Logistic Regression (Churn data).ipynb		Logistic Regression (Churn data).ipynb
Multiple Regression (CO2 Emission).ipynb		Multiple Regression (CO2 Emission).ipynb
Non-Linear Regression (China GDP).ipynb		Non-Linear Regression (China GDP).ipynb
Polynomial Regression (CO2 Emission).ipynb		Polynomial Regression (CO2 Emission).ipynb
README.md		README.md
Support Vector Machines.ipynb		Support Vector Machines.ipynb
cars_clus.csv		cars_clus.csv
cell_samples.csv		cell_samples.csv
china_gdp.csv		china_gdp.csv
drug200.csv		drug200.csv
drugtree.png		drugtree.png
movies.csv		movies.csv
teleCust1000t.csv		teleCust1000t.csv
weather-stations20140101-20141231.csv		weather-stations20140101-20141231.csv

chuphalashit19/Machine-Learning-Basics

Folders and files

Latest commit

History

Repository files navigation

Machine-Learning-Basics

Linear-Regression-ML-

Multiple-Regression-ML-

Polynomial-Regression-ML-

Non-Linear-Regression-ML-

K-Nearest-Neighbors-Classifier

Decision-Tree-Classifier

Logistic-Regression-with-python

Support-Vector-Machines(SVM)

k-means-Clustering

Hierarchical-Clustering

Density-Based-Clustering-DBSCAN

Recommendation-Systems :-

Content-based-Filtering

Collaborative-Filtering

About

Topics

Resources

Stars

Watchers

Forks

Languages