Impact Analysis of system.exit() usage in Spark Jobs
Is system.exit() a show stopper in Spark Jobs running through YARN in Cluster Mode. What is its impact?
Motivation
The first question needs to be addressed - Is it good to use the system.exit() in the spark jobs?. Ideally, not only in Spark jobs but in general for any JVM-based applications, it's a big NaaaaaH!
It kills the Spark Job prematurely by shutting down the whole JVM sequence abruptly. Although its usage is the least encouraged way even for lowest business priority applications. Please do some extensive research on it. This post clearly articulates only the impacts of the system.exit usage on the Spark applications. I faced a real-time issue that motivated me and it's entirely my experience that I'm expressing with you which I put forward into this article.
The first question needs to be addressed - Is it good to use the system.exit() in the spark jobs?. Ideally, not only in Spark jobs but in general for any JVM-based applications, it's a big NaaaaaH!
It kills the Spark Job prematurely by shutting down the whole JVM sequence abruptly. Although its usage is the least encouraged way even for lowest business priority applications. Please do some extensive research on it. This post clearly articulates only the impacts of the system.exit usage on the Spark applications. I faced a real-time issue that motivated me and it's entirely my experience that I'm expressing with you which I put forward into this article.
Background
Precisely speaking, I recently started working on a project for one of my financial client where they have already migrated traditional Map Reduce jobs to Spark jobs which does the processing of files that are landing in the inbound directory of edge node from the legacy systems through the network and generate the validation email reports/alerts for data being processed to the higher level of business. Those jobs were executed in Spark Client mode earlier. I know what's the thought running in your mind right now. Yes, undoubtedly email reports was an ancient technique but the architecture can't be transfigured as quick as a wink, it's a continuous effort.
Root Cause
Solution
So I tried to convince the team to make a few modifications to the current architecture. Here are those:
- Implemented an audit table that captures each job run details along with all processing metrics that are required for the email report.
- Removing all the system.exit() functionality in current spark jobs. Instead, having a post-job action that will be triggered after the spark job is executed instead of abruptly ending the spark job but finishing it gracefully triggers the post-job action.
- That simple post-job action collects all the data from the audit table based on the run date and generates the email report which will only execute on the edge node where SMTP is installed bypassing the need for SMTP setup on all the nodes of the cluster.
This way I was able to resolve two major issues within one simple solution.
Comments
Post a Comment