Skip to content

emmaarenas/data-quality-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Data Quality Analysis with R

This repository hosts a collection of Jupyter Notebooks in both English and Spanish, dedicated to performing data quality analysis using the R programming language. A detailed analysis structure is provided, enabling thorough inspection and enhanced understanding of an example dataset. It's important to note that the dataset used is for illustrative purposes only and its practical relevance is limited; it is included solely to demonstrate data analysis methodologies and techniques. Additionally, the repository contains HTML and PDF versions of the notebooks and the resulting images of the graphics.

Analysis Index

  1. Data Treatment
    • Importing Libraries
    • Loading Dataset
  2. Analysis
    • Data Overview
    • Detection of Duplicate Records
    • Detection of Missing Values
    • Detection of Atypical Values
  3. Some Statistical Calculations...

Viewing and Editing Notebooks

The Jupyter Notebooks in this repository can be viewed directly on GitHub, which allows for easy review of the analysis and outcomes without the need for local execution. For an interactive experience or to modify the analysis, it is recommended to clone the repository and work with the notebooks locally.

If you wish to execute or edit the notebooks on your own machine, ensure you have an R distribution installed, along with the packages mentioned in the notebooks. Jupyter users will need to install IRKernel to enable the execution of R within this environment.

Contributing

If you would like to contribute to the project, please fork