Goals:
- clean and process information in database entitled "ecommerce" that consists of five tables
- Understand the relationships between the five tables
- Analyze, based on the information given, for patterns in the total transaction revenue and product sales from visitors of different countries and cities
Step 1: Import Data
Step 2: Understand the tables and the relationships between fields for each table
Step 3: Clean data
Step 4: Analysis using query to find patterns in total product revenue and sales
(fill in what you discovered this data could tell you and how you used the data to answer those questions)
- The company is likely based in the United States as the majority of the products were purchased from unique visitors in the US
- the top products purchased in the United States consisted of a lot of brands from Youtube, men's t-shirts and other apparel, and electronics. We see that in Mountain View, in particular, have larger purchases for Nest products. It could be that because Google HQ are there. The product category needs to be further cleaned, however, as there were some outliers for the same product in Mountain View.
- Although the United States has the most impact from the revenue generated, Israel and Australia also had a significant impact compared to the US.
- Data's sample size was not high enough to make proper conlusions for a whole year of sales due to missing and inaccurate values
- Unsure of some fields and what they mean
- Missing data (city, country) - made assumptions
- Didn't have enough time to clean the tables
If I had more time, I would:
- clean and fix more fields of interest (city, country, productprice) to make the raw data make more sense
- Properly rename columns to have the headers more readible
- Fix spellings in text columns to make it more clear (e.g. product category) so that it can be properly categorized to make the analysis process easier