Skip to content

Databricks

Yitao Li edited this page Jun 22, 2020 · 1 revision
  • Deployment was mostly straightforward. Just make sure to choose the EC2 region closest to you. Otherwise there may be large latency with SSH connections later on.
  • Make sure to include your public key in the SSH tab before clicking "create cluster" so that the nodes could be accessed via SSH later on
  • There is also a way to inject authorized SSH key using Databricks notebook after cluster deployment, as documented in https://docs.databricks.com/clusters/configure.html#configure-an-existing-cluster-with-your-public-key
  • Must use port 2200 (not 22) to SSH into Spark driver -- ssh ubuntu@<hostname> -p 2200 -i <private-key-file-path> so the relevant security group needs to be changed to allow traffic from a given external IP on port 2200