EDA with Data Visualization

This study focuses on improving visualization technics throughout the EDA and Feature Engineering process before the model development. The Google Play Store dataset is used for this study, it includes the app's information on the different categories.

Generally, in the Kaggle notebooks, the main purpose of using this dataset is to predict the number of installs of the apps according to the given features. However, the focus of this study is not on developing the prediction model but is dealing with the techniques and details of the model development process preprocessing. Because preprocessing is one of the most important processes of model development. Especially, visualization technics are very helpful for this purpose. Extracting information is a leading process to decide what we expect from the model and which features can be more essential to detect the target feature.

Actually, this study does not include detailed information about the dataset, but it provides all techniques/codes to make data transformation, descriptive analysis, and visualization. So, you can use these techniques and perspectives before each model development process. The dataset includes categorical and numeric values at the same time, so you can find how you can deal with both features.

I hope this notebook will be a good resource for preprocessing and exploratory data analysis with visualization techniques.

Dataset

The dataset used in this study is obtained from the Kaggle, you can reach it from this link. Only 'googleplaystro.csv' is used for this study. You can also reach the dataset below the dataset folder. The dataset includes 13 features, you can find the details of the dataset in the data transformation notebook.

The transformed dataset in the first phase also was uploaded.

Environment

To install the dependencies to run the notebook, you can use Anaconda. Once you have installed Anaconda, run:

$ conda env create -f environment.yml

Notebooks

data-transformation.ipynb notebook includes all data cleaning, and transformation processes.

eda-visualization.ipynb includes all visualization techniques for univariate and bivariate analysis.

Proposed Resources

Throughout this study, several resources helped but especially the Exploratory Data Analysis with Python Cookbook By Ayodele Oluleye helped to how can we approach when the data is visualized. It's a strongly recommended resource. You can find the other resources;

Contribution

If you want to contribute please, send your pull request. All contributions are welcome!

Please check that repository for updates, for opening issues or sending pull requests.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
dataset		dataset
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
data-transformation.ipynb		data-transformation.ipynb
eda-visualization.ipynb		eda-visualization.ipynb
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset

dataset

.DS_Store

.DS_Store

LICENSE

LICENSE

README.md

README.md

data-transformation.ipynb

data-transformation.ipynb

eda-visualization.ipynb

eda-visualization.ipynb

environment.yml

environment.yml

Repository files navigation

EDA with Data Visualization

Dataset

Environment

Notebooks

Proposed Resources

Contribution

About

Releases

Packages

Languages

License

ftmoztl/EDA-data-visualization

Folders and files

Latest commit

History

Repository files navigation

EDA with Data Visualization

Dataset

Environment

Notebooks

Proposed Resources

Contribution

About

Topics

Resources

License

Stars

Watchers

Forks

Languages