This is a Spark Project to get the Daily revenue of the products based on many parameters which are explained in detail in this project.
Technologies and Conecepts used in this project
- Hadoop
- HDFS
- HIVE
- SQOOP
- Spark
- Spark SQL
- Spark RDD
- Spark Dataframes
System Configurations
- Personal Laptop: MacOS 10.15.4 (Catalina)
- UNIX System hostname: gw02.itversity.com
- Connect using 'ssh gw02.itversity.com'
- Prompts with a password. (Give the password provided to you.)
Data Locations
- Data is present in the server 'gw02.itversity.com' in the HDFS Cluster in path: '/public/retail_db'
- Access the data in the path using the command
hadoop fs -ls /public/retail_db
- Access the data in the path using the command
Apache Sqoop allows easy import and export of data from structured data stores such as relational databases, enterprise data warehouses, and NoSQL systems. Using Sqoop, you can provision the data from external system on to HDFS, and populate tables in Hive and HBase. Sqoop integrates with Oozie, allowing you to schedule and automate import and export tasks. Sqoop uses a connector based architecture which supports plugins that provide connectivity to new external systems.
Access mysql in the server
mysql -u retail_user -h ms.itversity.com -p
password: itversity
- Sqoop Import
The below two loads are full loads. To use incremental load, use the below syntax in the sqoop commands.
-> –incremental (Modes: append & lastModified) -> –check-column -> –last valueSqoop Import into HDFS
Link to Sqoop Documentation: https://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_literal_sqoop_import_literal <<Check 7.2.2 Table 3>>
When prompted for password:
Enter Password: itversity
- sqoop import --connect jdbc:mysql://ms.itversity.com:3306/retail_db --username retail_user -P --table orders --target-dir=/user/pratikgaurav/sqoop_import/retail_db/orders -m 1
- sqoop import --connect jdbc:mysql://ms.itversity.com:3306/retail_db --username retail_user -P --table order_items --target-dir=/user/pratikgaurav/sqoop_import/retail_db/order_items -m 1
- sqoop import --connect jdbc:mysql://ms.itversity.com:3306/retail_db --username retail_user -P --table products --target-dir=/user/pratikgaurav/sqoop_import/retail_db/products -m 1
- sqoop import --connect jdbc:mysql://ms.itversity.com:3306/retail_db --username retail_user -P --table departments --target-dir=/user/pratikgaurav/sqoop_import/retail_db/departments -m 1
- sqoop import --connect jdbc:mysql://ms.itversity.com:3306/retail_db --username retail_user -P --table customers --target-dir=/user/pratikgaurav/sqoop_import/retail_db/customers -m 1
- sqoop import --connect jdbc:mysql://ms.itversity.com:3306/retail_db --username retail_user -P --table categories --target-dir=/user/pratikgaurav/sqoop_import/retail_db/categories -m 1
Sqoop Import into HIVE
Link to Sqoop Documentation: https://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_literal_sqoop_import_literal <<Check 7.2.9 Table 8>>
When prompted or password:
Enter Password: itversity
- sqoop import --connect jdbc:mysql://ms.itversity.com:3306/retail_db --username retail_user -P --table orders --mysql-delimiters --hive-import --hive-overwrite --create-hive-table --hive-table retail_db.orders_pk -m 1
- sqoop import --connect jdbc:mysql://ms.itversity.com:3306/retail_db --username retail_user -P --table order_items --mysql-delimiters --hive-import --hive-overwrite --create-hive-table --hive-table retail_db.order_items_pk -m 1
- sqoop import --connect jdbc:mysql://ms.itversity.com:3306/retail_db --username retail_user -P --table products --mysql-delimiters --hive-import --hive-overwrite --create-hive-table --hive-table retail_db.products_pk -m 1
- sqoop import --connect jdbc:mysql://ms.itversity.com:3306/retail_db --username retail_user -P --table departments --mysql-delimiters --hive-import --hive-overwrite --create-hive-table --hive-table retail_db.departments_pk -m 1
- sqoop import --connect jdbc:mysql://ms.itversity.com:3306/retail_db --username retail_user -P --table customers --mysql-delimiters --hive-import --hive-overwrite --create-hive-table --hive-table retail_db.customers_pk -m 1
- sqoop import --connect jdbc:mysql://ms.itversity.com:3306/retail_db --username retail_user -P --table categories --mysql-delimiters --hive-import --hive-overwrite --create-hive-table --hive-table retail_db.categories_pk -m 1
-
Sqoop Export
-
Sqoop Merge
Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis.[2] Hive gives a SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop.