Skip to content

This repository is created to represent the processing and the analysis that has been done on this online retail dataset.

License

Notifications You must be signed in to change notification settings

amir-hojjati/Data-Analysis-Online-Retail-Transactions

Repository files navigation

Data Analysis : Online Retail Transnational Dataset


Dataset description and source


The dataset used in this demonstration can be found in the UCI machine learning repository and it can be accessed via this link.

"This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers."

It contains 8 attributes which are fully described below:

  • InvoiceNo: Invoice number. Nominal, a 6-digit integral number uniquely assigned to each transaction. If this code starts with letter 'c', it indicates a cancellation.
  • StockCode: Product (item) code. Nominal, a 5-digit integral number uniquely assigned to each distinct product.
  • Description: Product (item) name. Nominal.
  • Quantity: The quantities of each product (item) per transaction. Numeric.
  • InvoiceDate: Invice Date and time. Numeric, the day and time when each transaction was generated.
  • UnitPrice: Unit price. Numeric, Product price per unit in sterling.
  • CustomerID: Customer number. Nominal, a 5-digit integral number uniquely assigned to each customer.
  • Country: Country name. Nominal, the name of the country where each customer resides.

Flow of the project


In this demonstration, we are going to process and analyze a dataset for a non-store online retail. The complete details of each step is provided inside the notebooks. The summary of this project is as follows:
  1. The first step is data cleaning and preprocessing so we can feed good and cleaned data to the next levels.

  2. The second step is EDA and data visualization and we will inspect the cleaned data for useful information. There will be also an interactive dashboard and a choropleth map to get a better perspective.

  3. In the last step we will use clustering for customer segmentation and to find customer groups with similar behaviors for further analysis and business strategy planning. In this section we will also try to find the best association rules and to see which set of products were often bought together.


Implementation


This project was done in python using jupyter notebook. The required libraries for complete implementation can be found in the requirements file.

If there was any problem with opening the notebooks in github, it's possible to use nbviewer to open the notebooks and have them rendered by copying and pasting the notebook's url in the website.

About

This repository is created to represent the processing and the analysis that has been done on this online retail dataset.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published