Posts

Showing posts with the label Spark

Impact Analysis of system.exit() usage in Spark Jobs

Is system.exit() a show stopper in Spark Jobs running through YARN in Cluster Mode. What is its impact? Motivation     The first question needs to be addressed -  Is it good to use the system.exit() in the spark jobs? . Ideally, not only in Spark jobs but in general for any JVM-based applications, it's a big NaaaaaH !      It kills the Spark Job prematurely by shutting down the whole JVM sequence abruptly. Although its usage is the least encouraged way even for lowest business priority applications. Please do some extensive research on it. This post clearly articulates only the impacts of the system.exit usage on the Spark applications. I faced a real-time issue that motivated me and it's entirely my experience that I'm expressing with you which I put forward into this article. Background      Precisely speaking, I recently started working on a project for one of my financial client where they have already migrated traditional...

PySpark read and write to Phoenix table

A simple question, is it possible to load a pyspark data frame to a phoenix table. Well answer is YES. Let's see how can we do that. To push data from pyspark data frame to Phoenix table we need to have an existing phoenix table created through the Phoenix but not HBASE shell. Since, often times tables created from hbase shell doesn't appear in phoenix until you create a new table or view in phoenix and point that to existing hbase table. Instead to avoid those kind of discrepancies I would suggest to load data into hbase using phoenix if you willing to use it for querying. If table not exists, use the below command to create the table in Phoenix. CREATE TABLE IF NOT EXISTS <TABLE_NAME> ( ROWKEY <DATA_TYPE> NOT NULL PRIMARY KEY, <COL_FAMILY>.<COL_NAME> <DATA_TYPE>, <COL_FAMILY>.<COL_NAME> <DATA_TYPE>, - - - - - - - - - - - - - - -up to n number of columns); NOTE: If you're creating a phoenix table to point an exist...