Skip to content

This project aims to migrate data from MongoDB to Google Cloud Storage (GCS) and BigQuery automatically. It enables businesses to easily transfer and analyze the data in the cloud, improving data and cost management.

quannguyen0103/data-migration-on-GCP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 

Repository files navigation

Source

0. Setup

  • Create a Google Cloud VM
  • Install MongoDB on the VM to store Tiki product data
  • Create a GCS bucket
  • Create the BigQuery database and tables

1. Migrate data from MongoDB to GCS

Script: migrate_data

Workflow

  • Export the product collection from the tiki database to a JSON file product.json
  • Upload the JSON file to the mongodb-data-1 bucket
  • Use parallel_composite_upload_threshold to enable parallel composite uploads if the file size exceeds 150 megabytes
  • After the upload process is done, remove the JSON file
  • Use crontab to run the script at 22:00 everyday

2. Load data from the GCS bucket to BigQuery

Script: load_data

Workflow

  • Create a Google Cloud Function that triggers when the file product.json is uploaded to the mongodb-data-1 bucket and loads the data into the product table within the tiki database in BigQuery
  • Write records that failed to load to the BigQuery table to failed_records.json for later handling
  • Output: tiki_product_sample

3. Create a data mart containing seller and product infomation

Script: creat_datamart

Workflow

4. Analyze data

Script: analyze_data

Workflow

  • Create 2 tables
    • product_information: store product information - id, product_name, category, seller, price, quantity_sold, rating
    • product_origin: store product origin information - id, product_name, category, origin
  • Load the tables to Looker studio and visualize data Alt text

About

This project aims to migrate data from MongoDB to Google Cloud Storage (GCS) and BigQuery automatically. It enables businesses to easily transfer and analyze the data in the cloud, improving data and cost management.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published