Skip to content


Folders and files

Last commit message
Last commit date

Latest commit


Repository files navigation

                             **Diabetics Prediction Based on Multi-Linear Regressio Using Python Language**

Classification is an important technique in data mining which is applied in many fields including medical diagnosis to find diseases. In this research work, the multilinear regression algorithm is used to find the possibilities of occurrence of diabetes. This research work would help the developers to identify the characteristics and flow of algorithms. This implementation helps the diabetalogist to make decision quickly. The explict outcome of the performance of the algorithm is reported for the chosen data.

Keywords: Multi-Linear Regression, Diabetalogist, Classification

A. Problem Description: To predict occurrence of Diabetes in a patient depending on Glucose level and BMI valueMultiple Regression 1 Multiple Regression 2

A. Data set Description: The data was collected from the Kaggle website [8]. The dataset has seven (7) attributes.

  1. Glucose: Plasma glucose concentration.
  2. Blood Pressure: Diastolic blood pressure (mm Hg)
  3. Insulin: 2-Hour serum insulin (mu U/ml)
  4. BMI: Body mass index (weight in kg/(height in m)^2)
  5. Age: Age (years)
  6. Outcome: Class variable (0 or 1) The Outcome is the dependent variable and others are independent variables. The dataset size is 240KB.

Multiple regressions is an extension of linear regression into relationship between more than two variables. In simple linear relation we have one predictor and one response variable, but in multiple regressions we have more than one predictor variable and one response variable.

The general equation is: − y = a + b1x1 + b2x2 +...bnxn

Where, 1.y is the response variable. 2. a, b1, are the coefficients. 3. x1, x2, ...xn are the predictor variables


This research work predicts the presence or absence of diabetics in a patient. In this paper, regression algorithm has been presented with powerful diagnostic features for the occurrence of diabetics. The performance of regression algorithm is also analyzed using only selected attributes among total number of records from the input dataset.


No description, website, or topics provided.






No releases published


No packages published