Skip to content

madhurimarawat/Data-Visualization-using-python

Repository files navigation

Data-Visualization-using-python

This repository contains data visualization programs on various datasets done using python.

Data Visualization

What-is-Data-Visualization-Blog-Header


--> Data visualization is the graphical representation of information and data in a pictorial or graphical format(Example: charts, graphs, and maps).

--> Data visualization tools provide an accessible way to see and understand trends, patterns in data, and outliers.

--> Data visualization tools and technologies are essential to analyzing massive amounts of information and making data-driven decisions.

--> The concept of using pictures is to understand data that has been used for centuries. General types of data visualization are Charts, Tables, Graphs, Maps, Dashboards.

Various forms of Data Visualization

Various forms of Data Visualization

About Python Programming

--> Python is a high-level, general-purpose, and very popular programming language.

--> Python programming language (latest Python 3) is being used in web development, Machine Learning applications, along with all cutting-edge technology in Software Industry.

--> Python is available across widely used platforms like Windows, Linux, and macOS.

--> The biggest strength of Python is huge collection of standard library.


Mode of Execution Used Google Colab

--> Colaboratory, or “Colab” for short, is a product from Google Research which allows anybody to write and execute python code in Jupyter notebook through the browser.

--> Visit colab at:  Google Colab

--> Create account using google account.

--> Once account creation is done, we can directly start coding in colab.

--> It supports Python and R.

--> Files are directly saved in Google Drive.


Table Of Contents 📔 🔖 📑

  1. Download the House Pricing dataset from Kaggle and map the values to Aesthetics.

  2. Use different Color scales on the Rainfall Prediction dataset.

  3. Create different Bar plots for variables in any dataset.

  4. Show an example of Skewed data and removal of skewedness.

  5. For a sales dataset do a Time Series Visualization.

  6. Build a Scatterplot and suggest dimension reduction.

  7. Use Geospatial Data-Projections on datasets.

  8. Create the a trend line with a confidence band in any suitable dataset.

  9. Illustrate Partial Transparency and Jittering.

  10. Illustrate usage of different color codes.


Various Libraries in Python for Data Visualization

To install python library this command is used-

pip install library_name 
python Library

Dataset Used

Housing Dataset

--> Dataset is taken from: Housing Dataset

--> CSV file which contains house pricing data.

--> Price of house with respect to area and other basic amenties.

Rainfall Prediction Dataset

--> Dataset is taken from: Housing Dataset

--> CSV file which contains the rainfall data.

--> Sub-division wise monthly data for 115 years from 1901-2015.

Buisness Dataset

--> Dataset is taken from: Buisness Dataset

--> Business financial data provides sales, purchases, salaries and wages, and operating profit estimates for most market industries in New Zealand, and information on stocks for selected industries.

--> This collection uses a combination of survey, tax, and other administrative data.

Sales Dataset

--> Dataset is taken from: Sales Dataset

--> CSV file which contains the sales data.

Mineral ores round the world Dataset

--> Dataset is taken from: Minerals Dataset

--> Dataset of minerals found around the world.

Automobile Dataset

--> Dataset is taken from: 🔗Automobile Dataset

--> This contains data about various automobile in Comma Separated Value (CSV) format.

--> CSV file contains the details of automobile-mileage,length,body-style among other attributes.

--> It contains the following dimensions-[60 rows X 6 columns].

--> The csv file is already preprocessed ,thus their is no need for data cleaning.

NBA Players Dataset

--> Dataset is taken from: 🔗NBA Dataset

--> This contains data about various NBA Players in Comma Separated Value (CSV) format.

--> CSV file contains the details of players-height,weight,team,position among other attributes.

--> It contains the following dimensions-[457 rows X 9 columns].

--> The csv file is already preprocessed ,thus their is no need for data cleaning.

Libraries Used

Short Description about all libraries used.

  • NumPy (Numerical Python) – Enables with collection of mathematical functions to operate on array and matrices.
  • Pandas (Panel Data/ Python Data Analysis) - This library is mostly used for analyzing, cleaning, exploring, and manipulating data.
  • Matplotlib - It is a data visualization and graphical plotting library.
  • Seaborn - It is an extension of Matplotlib library used to create more attractive and informative statistical graphics.
  • SciPy (Scientific Python) - used for scientific computation. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing
  • Scikit-learn - It is a machine learning library that enables tools for used for many other machine learning algorithms such as classification, prediction, etc.
  • Geopandas-GeoPandas, as the name suggests, extends the popular data science library pandas by adding support for geospatial data.

Thanks for Visiting 😄

Drop a 🌟 if you find this repository useful.

If you have any doubts or suggestions, feel free to reach me.

📫 How to reach me:   Linkedin Badge     Mail Illustration📫