Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yarn Support On HDP 2.6.0.3-8 with Spark 2.xx #905

Open
GabeChurch opened this issue Aug 25, 2017 · 3 comments
Open

Yarn Support On HDP 2.6.0.3-8 with Spark 2.xx #905

GabeChurch opened this issue Aug 25, 2017 · 3 comments

Comments

@GabeChurch
Copy link

GabeChurch commented Aug 25, 2017

It seems that using spark-notebook on a newer versions of hortonworks hadoop and spark will cause an error to be thrown when attempting to use with the yarn-manager in cluster mode. I was getting a "bad substitution" error (viewed from the yarn scheduler portal).

@GabeChurch
Copy link
Author

GabeChurch commented Aug 25, 2017

I was able to solve the problem by adding the hdp.version property through Ambari.

  • Go to 'Ambari -> YARN -> configs' and go to 'Advanced' tab.
  • Scroll down the page to till end, there will find an option to add custom property for yarn-site
  • Click on 'add property' and enter 'hdp.version' and the version value. (2.6.0.3-8 for me)
  • Save the changes and restart the required services. Now it will deploy the hdp.version property in yarn-site.xml (which you could alternatively add-in manually)

These steps solved the problem for me, hope they help you as well!

@GabeChurch
Copy link
Author

GabeChurch commented Aug 27, 2017

Oh and a side note, if you are having problems deploying on yarn you should check your environment variables with
env

I manually add the environment variables before starting spark-notebook (at the moment) with a bash script that I name env.sh and run using
source env.sh

You can find these environment variables by looking in your configs via Ambari or you can go into the

/usr/hdp/current

directory (this is an example, yours may be different) and search through the various directories for your specific paths.

Here is the env.sh script I created (I added a few non-essential environment variables just to be safe)

export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/usr/hdp/current/hadoop-client/conf"}
export JAVA_HOME=${JAVA_HOME:-"/usr/jdk64/jdk1.8.0_112"}
export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/usr/hdp/current/hadoop-yarn-nodemanager"}
export SPARK_CONF_DIR=${SPARK_CONF_DIR:-/usr/hdp/current/spark2-thriftserver/conf}
export SPARK_HOME=${SPARK_HOME:-/usr/hdp/current/spark2-thriftserver}
export SPARK_LOG_DIR=/var/log/spark2
export SPARK_PID_DIR=/var/run/spark2`

To recap, you must have these environment variables loaded before running spark-notebooks or you must add them into the initialize script in spark-notebooks so that they auto-load upon deployment. If you are running spark on yarn in a hadoop cluster you should be familiar with everything I am saying and if not I suggest doing research on your configs, their locations, and spark on yarn deployment.

@vidma
Copy link
Contributor

vidma commented Aug 29, 2017

feel free to improve the docs by making a pull request, e.g. https://github.com/spark-notebook/spark-notebook/blob/master/docs/clusters_clouds.md

P.S. to run apps on YARN cluster you also need spark.yarn.archive set.
e.g. https://github.com/spark-notebook/spark-notebook/blob/master/docs/clusters_clouds.md#secured-yarn-cluster

if you wanna force this for ALL notebooks you may set it like this: bin/spark-notebook -Dmanager.notebooks.override.spark.yarn.archive=some_hdfs_file_path.zip ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants