Yarn Support On HDP 2.6.0.3-8 with Spark 2.xx #905

GabeChurch · 2017-08-25T21:14:27Z

It seems that using spark-notebook on a newer versions of hortonworks hadoop and spark will cause an error to be thrown when attempting to use with the yarn-manager in cluster mode. I was getting a "bad substitution" error (viewed from the yarn scheduler portal).

GabeChurch · 2017-08-25T21:18:13Z

I was able to solve the problem by adding the hdp.version property through Ambari.

Go to 'Ambari -> YARN -> configs' and go to 'Advanced' tab.
Scroll down the page to till end, there will find an option to add custom property for yarn-site
Click on 'add property' and enter 'hdp.version' and the version value. (2.6.0.3-8 for me)
Save the changes and restart the required services. Now it will deploy the hdp.version property in yarn-site.xml (which you could alternatively add-in manually)

These steps solved the problem for me, hope they help you as well!

GabeChurch · 2017-08-27T16:00:25Z

Oh and a side note, if you are having problems deploying on yarn you should check your environment variables with
env

I manually add the environment variables before starting spark-notebook (at the moment) with a bash script that I name env.sh and run using
source env.sh

You can find these environment variables by looking in your configs via Ambari or you can go into the

/usr/hdp/current

directory (this is an example, yours may be different) and search through the various directories for your specific paths.

Here is the env.sh script I created (I added a few non-essential environment variables just to be safe)

export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/usr/hdp/current/hadoop-client/conf"}
export JAVA_HOME=${JAVA_HOME:-"/usr/jdk64/jdk1.8.0_112"}
export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/usr/hdp/current/hadoop-yarn-nodemanager"}
export SPARK_CONF_DIR=${SPARK_CONF_DIR:-/usr/hdp/current/spark2-thriftserver/conf}
export SPARK_HOME=${SPARK_HOME:-/usr/hdp/current/spark2-thriftserver}
export SPARK_LOG_DIR=/var/log/spark2
export SPARK_PID_DIR=/var/run/spark2`

To recap, you must have these environment variables loaded before running spark-notebooks or you must add them into the initialize script in spark-notebooks so that they auto-load upon deployment. If you are running spark on yarn in a hadoop cluster you should be familiar with everything I am saying and if not I suggest doing research on your configs, their locations, and spark on yarn deployment.

vidma · 2017-08-29T01:35:59Z

feel free to improve the docs by making a pull request, e.g. https://github.com/spark-notebook/spark-notebook/blob/master/docs/clusters_clouds.md

P.S. to run apps on YARN cluster you also need spark.yarn.archive set.
e.g. https://github.com/spark-notebook/spark-notebook/blob/master/docs/clusters_clouds.md#secured-yarn-cluster

if you wanna force this for ALL notebooks you may set it like this: bin/spark-notebook -Dmanager.notebooks.override.spark.yarn.archive=some_hdfs_file_path.zip ...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yarn Support On HDP 2.6.0.3-8 with Spark 2.xx #905

Yarn Support On HDP 2.6.0.3-8 with Spark 2.xx #905

GabeChurch commented Aug 25, 2017 •

edited

GabeChurch commented Aug 25, 2017 •

edited

GabeChurch commented Aug 27, 2017 •

edited

vidma commented Aug 29, 2017 •

edited

Yarn Support On HDP 2.6.0.3-8 with Spark 2.xx #905

Yarn Support On HDP 2.6.0.3-8 with Spark 2.xx #905

Comments

GabeChurch commented Aug 25, 2017 • edited

GabeChurch commented Aug 25, 2017 • edited

GabeChurch commented Aug 27, 2017 • edited

vidma commented Aug 29, 2017 • edited

GabeChurch commented Aug 25, 2017 •

edited

GabeChurch commented Aug 25, 2017 •

edited

GabeChurch commented Aug 27, 2017 •

edited

vidma commented Aug 29, 2017 •

edited