Skip to content

chomiczdawid/data-preparation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data preparation

This repository shows the process of preparing data for creating a statistical model in the R programming language.

The dataset concerns the parameters used in the beer brewing process. 11 variables were arbitrarily selected from a dataset containing 29 variables. These variables are to be used to build a statistical model that examines the effect of selected variables on alcohol by volume.

The process outlined includes:

  • descriptive analysis of selected variables, determination of the measurement scale and visualization
  • imputation of missing data
  • outliers identification
  • analysis of correlation between variables
  • data sampling

Used technology

Used libraries

library(dplyr)
library(ggplot2)
library(VIM)
library(gridExtra)
library(corrplot)