ETLexample

This is examples of my ETL processes made by Spark on Scala.

This ETL process processes data on tax transactions. We take the initial layer, perform aggregating operations on it like Filtering, adding new columns, joins. After that, the data is written to HDFS. The result is a tax report for the internal services of the bank.

Inside you can find next objects and classes:

1.0 srs/main/scala

1.1 Application

Main app. It accepts spark arguments and addition arguments like startDate, endDate, asnuCode. These information is needed to perform correct ETL operation. Concurrentthought.cla library is used to parse these arguments. asnuCode is internal technical code of bank. When asnuCode is 6000 it is called "EKP" number, when it is 9000 it is called "EKS" number.

1.2 ApplicationArguments

This class is needed for Concurrentthought.cla to create new arguments for main app

1.3 Const

Constants which are used in this ETL process

1.4 DataSources

Here you can find methods which reads initial dataframes

1.5 EKP and EKS masks

Each tax transactions has dt_acoount ad kt_account number. During ETL proccess we need to filter these transactions according to mask. Mask is first 5 numbers of dt_account or kt_account number.

1.6 Sheet30AggregationTransform

Here you can find transform method which performs all aggregation functions. According to asnuCode it chooses right process.

2.0 srs/main/test

2.1 AggregationEKP30Test

Here are tests for EKP process. First test is called main app test and it just start main app with arguments. Next tests check that all filters and aggregations works correct. We know for sure that transactions with certain masks should pass all filters and aggregations and stays in final dataframe. We crate a case class which represent one tax transaction and performing transform operation on it. We check all combinations of masks. If we get empty dataframe after ETL process it means that test is failed.

2.2 DTO case class for tests

3.0 res

3.1 in

initial dataframs

3.2 out

folder where final dataframe will be written

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
project		project
res/in		res/in
src		src
target		target
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

project

project

res/in

res/in

src

src

target

target

README.md

README.md

build.sbt

build.sbt

Repository files navigation

ETLexample

About

Releases

Packages

Languages

AlexanderSaykov/ETL

Folders and files

Latest commit

History

Repository files navigation

ETLexample

About

Resources

Stars

Watchers

Forks

Languages