Skip to content

The goal of this project is to analyse a dataset (made of CSVs and Jsons files) by using a Data Lakehouse with Snowflake. You will have to upload the data on a cloud storage, ingest the data into the Data Lakehouse, perform data transformation and finally analyse it.

buithehai1994/Data-Lakehouse-with-Snowflake

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data-Lakehouse-with-Snowflake

Project Objective

The main objective of this project is to analyze YouTube data to gain insights into a content strategy for launching a successful YouTube Channel. The analysis will be conducted in four main parts:

  1. Summary of Project Content: Provides an overview of the project structure and objectives.
  2. Data Integration (Part 1): Introduces steps to upload data into Snowflake from Azure Storage, ensuring the dataset is prepared for analysis.
  3. Data Cleaning (Part 2): Performs data cleaning to prepare the final data table for analysis.
  4. Data Analysis (Part 3): Analyzes and visualizes the results obtained from data cleaning to explore YouTube trends across different countries and categories.
  5. Recommendations (Part 4): Utilizes insights to determine an optimal content category for launching a new YouTube channel, excluding "Music" and "Entertainment," and evaluates its potential success across different countries.

Please refer to the report in the Brief of requirements and Report folder for more information.

Conclusion

Overall, the project aims to provide actionable insights into YouTube trends, offering recommendations for content creation and promotion strategies in launching a new YouTube Channel. These insights can serve as a foundation for informed decision-making, helping creators maximize their channel's potential for success.

Acknowledgments

Dataset has been extracted through the Youtube API and made available on the Kaggle (https://www.kaggle.com/rsrishav/youtube-trending-video-dataset)

This dataset includes several months (from 2020-08-12 to today) of data of daily trending YouTube videos. Data is included for the IN, US, GB, DE, CA, FR, RU, BR, MX, KR, and JP regions (India, USA, Great Britain, Germany, Canada, France, Russia, Brazil, Mexico, South Korea, and, Japan respectively), with up to 200 listed trending videos per day.

(This task is from the Master of Data Science and Innovation course of University of Technology Sydney, and it is the asset of TD School)

About

The goal of this project is to analyse a dataset (made of CSVs and Jsons files) by using a Data Lakehouse with Snowflake. You will have to upload the data on a cloud storage, ingest the data into the Data Lakehouse, perform data transformation and finally analyse it.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published