sparkwatchdogs

simple implementation for spark watchdogs for etl processing.

Motivation

We have production etl jobs that run all day long. The system should be as automatic as possible, without human intervention. We still trying to get this level, so we constantly building additional self-checks or as I call them "watchdogs". Many times it means that just killing some stage/job/application is better than just finding out that due to one stage the whole pipeline did nothing in last few hours and the whole pipeline is in huge delay.

Imprortant notice

See pom.xml, all 3'd party libs are "provided", so to use it you need to add them to your fat jar explicitly or to your deployment lib. This will permit you to control dependencies, without adding to your dependency hell yet another problem.

Example

see org.apache.spark.watchdogs.Watchdogs as usage example

What we have currently

Currently there are two watchodgs:

CoresWaitingWatchdog - Probably due to some bug in spark core, sometime there is a situation when the cores are free and available, but application can't get them. In this scenario it waiting for cores without terminating. When such application is killed, next retry to submit exactly same application gets all necessary cores and proceedes without a problem. You can deploy some cron based solution to retry the submission of same application. Pay attention that there are all kind of different problems why this situation is "normal", e.g. you may have firewall problem and driver can't connect to master and get resources(there are plenty of stackoverflow questions regarding this problem). Make sure to use this watchdog if you 100% sure that your cluster setup is ok.
SparkStageHangingWatchdog - Sometimes some stage hangs. Due to other reasons(see spark-gotchas) we can't use speculation mode, so we prefer just to cancel some stage(so probably the whole job will be canelled)

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

pom.xml

pom.xml

Repository files navigation

sparkwatchdogs

Motivation

Imprortant notice

Example

What we have currently

About

Releases

Packages

Languages

License

IgorBerman/sparkwatchdogs

Folders and files

Latest commit

History

Repository files navigation

sparkwatchdogs

Motivation

Imprortant notice

Example

What we have currently

About

Topics

Resources

License

Stars

Watchers

Forks

Languages