Skip to content

Integrated Practice Work with Kafka,Log4j2,Flume,Java,HDFS,Hive, and Web Dev

Notifications You must be signed in to change notification settings

concealedtea/tooltest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cumultive Data Project Weeks 1-5 Internship (First Half)

Resources used: Apache Kafka, Apache Hadoop, Apache Flume, Hadoop DFS, Java, Maven, Log4J, CentOS, Apache Zookeeper.

Steps 1-5 can be edited to just take data (if doing practice with local files) and put it onto HDFS with the command

"hadoop fs -copyFromLocal 'file/address/in/linux' 'hdfs/location/' "

Step 1:

Use collected data on 1 Hive table (Hue/Company HDFS) and store it onto Personal HDFS

insert overwrite directory '/user/hue/sample_test' row format delimited fields terminated by '|' select device_idfa,device_mac,device_manufacturer,device_screen_pixel_metric,device_model from adcocoa_device where device_idfa is not null and device_idfa != 'null'

The command above stores the data into small pieces in /user/hue/sample_test. There the files can be downloaded and imported onto local file system

Step 2:

Write Java Program to read files from local environment, parse linearly, and send data to Kafka using the log4j and kafka packages provided on Maven.

/src/main/java/parser.java

Step 3:

Wrap package into .jar file and export to HDFS for Kafka/Flume Processing

Step 4:

Start up all needed resources on CentOS (Linux Distribution I'm using, yours may be different). Name/Data nodes, Zookeeper, Kafka, Hadoop. Do -> jps <- to make sure all of them are online.

Step 5:

Run Flume to have a receiver after setting up a Flume.conf file.

flume-ng agent -n flume1 -c conf -f flume.conf - Dflume.root.logger=INFO,console

Run the java -jar file to start the Kafka Producer

java -jar tooltest-VERSION-SNAPSHOT.jar

With both the Consumer/Producer running, the files from the folder will now be read into Hadoop Distributed File System (HDFS) and stored under '/user/kafka/database/%topic/%y-%m-%d'

Step 6:

Install and configure Hive. Start up Hive.

'$HIVE_HOME/bin/hive'

Create a table in hive delimited by whatever you are delimited by, in this case it's the pipe character |

create table tablename(a int, b string, c string, d string, e string)

row format delimited

fields delimited by '\|';

Load data from hdfs into hive table

'load data inpath 'filepath/path' into table tester2

Step 7:

Create sorted table that sorts by phone brand, we'll use this data to create a visual after sending to MySQL

'insert into table sortorder select phone,count(phone) as phoneCount from tester2 group by phone order by phoneCount desc;'

This is a sorted table with entries in 2 columns of phone brand and the # of times that people using that brand have accessed our app.

Step 8:

Use Sqoop (ver 1.4.6 compatible with Hadoop 2.8.0) to export data from hive warehouse to MySQL for web visual integration.

./sqoop export --connect jdbc:mysql://localhost/test --username root -P --table test --fields-terminated-by ',' --lines-terminated-by '\n' --export-dir /user/hive/warehouse/tester2

Step 9:

See other project for continued development, including processing SQL Data to a webpage using Java and Spring

About

Integrated Practice Work with Kafka,Log4j2,Flume,Java,HDFS,Hive, and Web Dev

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages