Skip to content

madhurimarawat/ML-Model-Datasets-Using-Streamlits

Repository files navigation

ML-Model-Datasets-Using-Streamlits

This repository contains my machine learning models implementation code using streamlit in the Python programming language.

Website Image

Website Image


Mode of Execution Used PyCharm Streamlit

Pycharm

--> Visit the official website of pycharm: PyCharm

--> Download according to the platform that will be used like Linux, Macos or Windows.

--> Two versions of Pycharm are avilable-

  1. Community version

    --> Community version is open source and we can use it for free without any paid plan.

    --> We can download this at the end of pycharm website.

    --> After downloading community version we can directly follow the setup wizard and it will be setup.

  2. Professional Version.

    --> This is available at the top of website, we can directly download from there.

    --> After downloading professional version, follow the below steps.

    --> Follow the setup wizard and sign up for the free version (trial version) or else continue with the premium or paid version.

Using Pycharm

--> First, in pycharm we have the concept of virtual environment. In virtual environment we can install all the required libraries or frameworks.

--> Each project has its own virtual environment, so thath we can install requirements like Libraries or Framworks for that project only.

--> After this we can create a new file, various file types are available in pycharm like script files, text files and also Jupyter Notebooks.

--> After selecting the required file type, we can continue the execution of that file by saving it and using this shortcut shift+F10 (In Windows).

--> Output is given in Console while installation happens in terminal in Pycharm.

Streamlit Server

--> Streamlit is a python framework through which we can deploy any machine learning model and any python project with ease and without worrying about the frontend.

--> Streamlit is very user-friendly.

--> Streamlit has pre defined functions for all frontend components and we can directly use them.

--> To install streamlit in your system, just run this command-

pip install streamlit

Running Project in Streamlit Server

Make Sure all dependencies are already satisfied before running the app.

  1. We can Directly run streamlit app with the following command-
streamlit run app.py

where app.py is the name of file containing streamlit code.

By default, streamlit will run on port 8501.

Also we can execute multiple files simultaneously and it will be executed in next ports like 8502 and so on.

  1. Navigate to URL http://localhost:8501

You should be able to view the homepage of your app.

🌟 Project and Models will change but this process will remain the same for all Streamlit projects.

Deploying using Streamlit

  1. Visit the official website of streamlit : Streamlit

  2. Now make an account with GitHub.

  3. Now add all the code in Github repository.

  4. Go to streamlit and there is an option for new deployment.

  5. Type your Github repository name and specify the file name. If you name your file as streamlit_app it will directly access it else you have to specify the path.

  6. Now also make sure you upload all your libraries and requirement name in a requirement.txt file.

  7. Version can also be mentioned like this python==3.9.

  8. When we mention version in the requirement file streamlit install all dependencies from there.

  9. If everything went well our app will be deployed on web and you can share the link and access the app from all browsers.

About Project :

Complete Description about the project and resources used.

--> In this project I made a streamlit website in which you can apply multiple supervised learning algorithm on various datasets.

--> I also did Data Visualization to show the working of this algorithms on the datasets.

--> I have deployed this website using streamlit.

--> Visit Website from : ML Algorithms on Inbuilt and Kaggle Datasets


Algorithm Used :

Supervised Learning

--> Basically supervised learning is when we teach or train the machine using data that is well-labelled.

--> Which means some data is already tagged with the correct answer.

--> After that, the machine is provided with a new set of examples(data) so that the supervised learning algorithm analyses the training data(set of training examples) and produces a correct outcome from labeled data.

i) K-Nearest Neighbors (KNN)


--> K-Nearest Neighbours is one of the most basic yet essential classification algorithms in Machine Learning.

--> It belongs to the supervised learning domain and finds intense application in pattern recognition, data mining, and intrusion detection..

--> In this algorithm,we identify category based on neighbors.

ii) Support Vector Machines (SVM)


--> The main idea behind SVMs is to find a hyperplane that maximally separates the different classes in the training data.

--> This is done by finding the hyperplane that has the largest margin, which is defined as the distance between the hyperplane and the closest data points from each class.

--> Once the hyperplane is determined, new data can be classified by determining on which side of the hyperplane it falls.

--> SVMs are particularly useful when the data has many features, and/or when there is a clear margin of separation in the data.

iii) Naive Bayes Classifiers


--> Naive Bayes classifiers are a collection of classification algorithms based on Bayes’ Theorem.

--> It is not a single algorithm but a family of algorithms where all of them share a common principle, i.e. every pair of features being classified is independent of each other.

--> The fundamental Naive Bayes assumption is that each feature makes an independent and equal contribution to the outcome.

iv) Decision Tree


--> It builds a flowchart-like tree structure where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (terminal node) holds a class label.

--> It is constructed by recursively splitting the training data into subsets based on the values of the attributes until a stopping criterion is met, such as the maximum depth of the tree or the minimum number of samples required to split a node.

--> The goal is to find the attribute that maximizes the information gain or the reduction in impurity after the split.

v) Random Forest


--> It is based on the concept of ensemble learning, which is a process of combining multiple classifiers to solve a complex problem and to improve the performance of the model.

--> Instead of relying on one decision tree, the random forest takes the prediction from each tree and based on the majority votes of predictions, and it predicts the final output.

--> The greater number of trees in the forest leads to higher accuracy and prevents the problem of overfitting.

vi) Linear Regression


--> Regression: It predicts the continuous output variables based on the independent input variable. like the prediction of house prices based on different parameters like house age, distance from the main road, location, area, etc.

--> It computes the linear relationship between a dependent variable and one or more independent features.

--> The goal of the algorithm is to find the best linear equation that can predict the value of the dependent variable based on the independent variables.

vii) Logistic Regression


--> Logistic regression is a supervised machine learning algorithm mainly used for classification tasks where the goal is to predict the probability that an instance of belonging to a given class or not.

--> It is a kind of statistical algorithm, which analyze the relationship between a set of independent variables and the dependent binary variables.

--> It is a powerful tool for decision-making.

--> For example email spam or not.

Dataset Used :

Iris Dataset

--> Iris Dataset is a part of sklearn library.

--> Sklearn comes loaded with datasets to practice machine learning techniques and iris is one of them.

--> Iris has 4 numerical features and a tri class target variable.

--> This dataset can be used for classification as well as clustering.

--> In this dataset, there are 4 features sepal length, sepal width, petal length and petal width and the target variable has 3 classes namely ‘setosa’, ‘versicolor’, and ‘virginica’.

--> Objective for a multiclass classifier is to predict the target class given the values for the four features.
Dataset is already cleaned,no preprocessing required.

Breast Cancer Dataset

--> The breast cancer dataset is a classification dataset that contains 569 samples of malignant and benign tumor cells.

--> The samples are described by 30 features such as mean radius, texture, perimeter, area, smoothness, etc.

--> The target variable has 2 classes namely ‘benign’ and ‘malignant’.

--> Objective for a multiclass classifier is to predict the target class given the values for the features.

--> Dataset is already cleaned,no preprocessing required.

Wine Dataset

--> The wine dataset is a classic and very easy multi-class classification dataset that is available in the sklearn library.

--> It contains 178 samples of wine with 13 features and 3 classes.

--> The goal is to predict the class of wine based on the features.

--> Dataset is already cleaned,no preprocessing required.

Digits Dataset

--> The digits dataset is a classic multi-class classification dataset that is available in the sklearn library.

--> It contains 1797 samples of digits with 10 classes.

--> The goal is to predict the class of digit based on the features.

--> Dataset is already cleaned,no preprocessing required.

Diabetes Dataset

--> The diabetes dataset is a regression dataset that is available in the sklearn library.

--> It contains 442 samples and 10 classes.

--> Dataset is already cleaned,no preprocessing required.

Naive bayes classification data

--> Dataset is taken from:

--> Contains diabetes data for classification.

--> The dataset has 3 columns-glucose, blood pressure and diabetes and 995 entries.

--> Column glucose and blood pressure data is to classify whether the patient has diabetes or not.

--> Dataset is already cleaned,no preprocessing required.

Cars Evaluation Dataset

--> Dataset is taken from: Cars Evaluation Dataset

--> Contains information about cars with respect to features like Attribute Values:

1. buying v-high, high, med, low 2.maint v-high, high, med, low 3.doors 2, 3, 4, 5-more 4. persons 2, 4, more 5. lug_boot small, med, big 6.safety low, med, high
--> Target categories are:

1. unacc 1210 (70.023 %) 2. acc 384 (22.222 %) 3. good 69 ( 3.993 %) 4. v-good 65 ( 3.762 %)
--> Contains Values in string format.

--> Dataset is not cleaned, preprocessing is required.

Salary Dataset

--> Dataset is taken from: Salary Dataset

--> Contains Salary data for Regression.

--> The dataset has 2 columns-Years of Experience and Salary and 30 entries.

--> Column Years of Experience is used to find regression for Salary.

--> Dataset is already cleaned,no preprocessing required.

Libraries Used 📚 💻

Short Description about all libraries used.

To install python library this command is used-

pip install library_name
  • NumPy (Numerical Python) – Enables with collection of mathematical functions to operate on array and matrices.
  • Pandas (Panel Data/ Python Data Analysis) - This library is mostly used for analyzing, cleaning, exploring, and manipulating data.
  • Matplotlib - It is a data visualization and graphical plotting library.
  • Scikit-learn - It is a machine learning library that enables tools for used for many other machine learning algorithms such as classification, prediction, etc.
  • Seaborn - It is an extension of Matplotlib library used to create more attractive and informative statistical graphics.

Thanks for Visiting 😄

Drop a 🌟 if you find this repository useful.

If you have any doubts or suggestions, feel free to reach me.

📫 How to reach me:   Linkedin Badge     Mail Illustration📫