Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore an alternative approach to Spark UI Proxy #3

Open
krishnan-r opened this issue Sep 14, 2021 · 1 comment
Open

Explore an alternative approach to Spark UI Proxy #3

krishnan-r opened this issue Sep 14, 2021 · 1 comment

Comments

@krishnan-r
Copy link
Contributor

Explore using https://github.com/jupyterhub/jupyter-server-proxy or another generic approach to provide the Spark UI through a proxy.

The current approach is brittle as it works only on localhost and is hardcoded. (This is currently removed in the refactor #1, will be added back.).

In our current deployment, we rely on https://github.com/swan-cern/jupyter-extensions/tree/master/SparkConnector as an external link. (this requires to be in the same network)

@berglh
Copy link

berglh commented Apr 15, 2024

I just want to chime in on this one. After seeing this issue, I have used the jupyter-server-proxy extension along with the jupyter-app-launcher extension to create a launcher button to launch a Jupyter workspace tab to load the Spark UI in the JupyterLab environment.

JupyterLab 3.6.2
Spark 3.4.0

This worked pretty well and we could see most of the UI. The only thing not working was the executors page. Even when I configure the Spark UI to use the appropriate base path, it just didn't seem to propagate through correctly to the executor details page to load the executor details.

spark.ui.proxyBase: /proxy/4040

If I then forward the port from the Kubernetes pod and connect directly to the Spark UI, everything was working as expected - so there is some minor unexpected behaviour with the jupyter-server-proxy plugin and the Spark UI - I suspect this was a unexpected behaviour from the Spark UI itself and may work in newer versions. I'm in the process of testing Spark 3.5.0.

The other thing to note is as per the same issues with the sparkmonitor UI connector, was that it is possible to start multiple Spark sessions from a single JupterLab notebook. In this case, the UI port will increment monotonically from 4040 on to 4041 and the app launcher icon fails to connect to any additional instances. Our master Pyspark instances are run in the same container/pod as JupyterLab, rather than spawning the master in a new pod in Kubernetes, and this resulted in a half working solution (better than no Spark UI).

There does appear to be an attribute in the sparkContext to get the URL of the UI: spark.sparkContext.uiWebUrl. The problem in the context of Docker is that it returns the container uid as the host address and is not routable in a development environment. My guess is there will be some cases where it's not possible to proxy the UI via jupyter-server-proxy reliably depending on the network configuration and environment of the Spark cluster configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants