Fix for code execution on the Jupyter Server #307

fcollonval · 2024-05-07T15:33:20Z

Working on https://github.com/datalayer/jupyter-server-nbmodel, we are facing issues with the current code.

Needs jupyter/jupyter_client#1023

Code changes

Store the document_id as state of the shared document model.
Restore the kernel degraded escape condition when executing a cell
Restore usage of execution callback onCellExecuted and onCellExecutionScheduled

github-actions · 2024-05-07T15:33:33Z

👈 Launch a Binder on branch jupyterlab/jupyter-collaboration/fix/server-execution-jp-server

krassowski · 2024-05-09T14:19:52Z

packages/docprovider/src/notebookCellExecutor.ts

+          // jupyverse case - it is undefined in jupyter-server
+          const fileId = notebook.sharedModel.getState('file_id') ?? '';
+          let documentId = `json:notebook:${fileId}`;
+          if (!fileId) {
+            if (
+              // FIXME sessionContext.path seems to be local - should we by-pass this test?
+              ['', this._drive.name].includes(
+                this._contents.driveName(
+                  sessionContext.session?.path ?? sessionContext.path
+                )
+              )
+            ) {
+              const localPath = this._contents.localPath(
+                sessionContext.session?.path ?? sessionContext.path
+              );
+              documentId =
+                this._drive.getRoomName({
+                  localPath: localPath,
+                  format: 'json',
+                  type: 'notebook'
+                }) ?? '';
+            }


It looks like elsewhere in the codebase the fileId is retrieved by requesting a session using requestDocSession:

jupyter-collaboration/packages/docprovider/src/yprovider.ts

Lines 95 to 103 in adb51b0

const session = await requestDocSession(

this._format,

this._contentType,

this._path

);

this._yWebsocketProvider = new YWebsocketProvider(

this._serverUrl,

`${session.format}:${session.type}:${session.fileId}`,

which makes a request to api/collaboration/session handled here:

jupyter-collaboration/projects/jupyter-server-ydoc/jupyter_server_ydoc/handlers.py

Lines 418 to 461 in adb51b0

@web.authenticated

@authorized

async def put(self, path):

"""

Creates a new session for a given document or returns an existing one.

"""

body = json.loads(self.request.body)

format = body["format"]

content_type = body["type"]

file_id_manager = self.settings["file_id_manager"]

idx = file_id_manager.get_id(path)

if idx is not None:

# index already exists

self.log.info("Request for Y document '%s' with room ID: %s", path, idx)

data = json.dumps(

{

"format": format,

"type": content_type,

"fileId": idx,

"sessionId": SERVER_SESSION,

}

)

self.set_status(200)

return self.finish(data)

# try indexing

idx = file_id_manager.index(path)

if idx is None:

# file does not exists

raise web.HTTPError(404, f"File {path!r} does not exist")

# index successfully created

self.log.info("Request for Y document '%s' with room ID: %s", path, idx)

data = json.dumps(

{

"format": format,

"type": content_type,

"fileId": idx,

"sessionId": SERVER_SESSION,

}

)

self.set_status(201)

return self.finish(data)

It seems to me that going this way would solve the problem with getRoomName which you documented in a fixme:

// FIXME we have issue with that key in case of file rename

What do you think?

Thanks @krassowski for the review.

I thought about re-using the endpoint. But the issue is the overhead. It will be an overkill to call it at each execution request.

Alternatively, I could set up a cache based on a WeakMap<NotebookModel, roomID> and discard the entry if the execution is refused due to improper input parameters. What do you think?

davidbrochart · 2024-05-13T07:40:38Z

packages/docprovider/src/notebookCellExecutor.ts

+          );
+          const cellId = cell.model.sharedModel.getId();
+
+          // jupyverse case - it is undefined in jupyter-server


Maybe we should make it part of the document state? I opened:

Add optional file ID to document state jupyter-server/jupyter_ydoc#198

Set file ID on document state #217

This could be a more efficient way than using a cache as proposed in #307 (comment)

fcollonval · 2024-05-15T09:33:27Z

@davidbrochart @krassowski for the roomID, finally I went to store it directly in the shared document state. Sharing it there allows to retrieve it quickly and avoid the need to reconstruct it from its part (easier maintenance).

fcollonval · 2024-05-15T09:34:20Z

@davidbrochart could you confirm that the latest version will work with jupyverse?

davidbrochart · 2024-05-15T13:09:16Z

I will.
BTW I see that you created a branch directly on https://github.com/jupyterlab/jupyter-collaboration. Would you mind creating a PR from your fork in the future?

davidbrochart · 2024-05-15T13:26:02Z

@davidbrochart @krassowski for the roomID, finally I went to store it directly in the shared document state. Sharing it there allows to retrieve it quickly and avoid the need to reconstruct it from its part (easier maintenance).

Then I think we should acknowledge this is now part of the document state, and merge jupyter-server/jupyter_ydoc#198.
I could rebase #217 and merge it too.

davidbrochart · 2024-05-15T13:53:21Z

I'm realizing that what you set in the shared document's state is the room_id, which makes more sense than the file_id.
What about renaming it to something more generic, e.g. document_id?

davidbrochart · 2024-05-15T14:35:51Z

packages/docprovider/src/notebookCellExecutor.ts

+  }
+}
+
+async function requestServer(


I'm not sure about this logic. I understand it's a way to have input requests and widgets to work, but the whole system is designed so that it works with https://github.com/datalayer/jupyter-server-nbmodel, which is a third-party extension.
There are other ways to support inputs and widgets, discussed in jupyter-server/jupyter_ydoc#227 (comment). They involve using CRDTs so that they are fully integrated into jupyter-collaboration.

so that it works with https://github.com/datalayer/jupyter-server-nbmodel, which is a third-party extension

@echarles @fcollonval what are your plans with respect to jupyter-server-nbmodel?

There are other ways to support inputs and widgets

I think that stdin box is so critical to the basic interface (as compared to ipywidgets) and different from other widgets that a special-case implementation make sense, assuming that it can work with page refresh. The implementation proposed in this PR does not work with page refresh, and the selling point of server-side execution is that the state is preserved even when the browser gets closed/disconnected.

I understand it's a way to have input requests and widgets to work

Only to have the input requests - widgets live their life with their renderer and the creation of comms as before as the frontend still connect to the kernel through websocket.

but the whole system is designed so that it works with https://github.com/datalayer/jupyter-server-nbmodel, which is a third-party extension.

That part is definitely opinionated. It only highlights the need to define a standard way to deal with input. I have no problem changing that part. A possibility to allow testing of various approach could be to remove the modification in the cell executor that are specific to jupyter-server-nbmodel. That extension will then provide its own executor independently of collaboration.

what are your plans with respect to jupyter-server-nbmodel?

our plan is to deprecate it as soon as there is an positive way to integrate that logic here or in jupyter-server (depending on the consensus).

this PR does not work with page refresh

Good point, I'll try to cover that case.

That part is definitely opinionated. It only highlights the need to define a standard way to deal with input. I have no problem changing that part. A possibility to allow testing of various approach could be to remove the modification in the cell executor that are specific to jupyter-server-nbmodel. That extension will then provide its own executor independently of collaboration.

I think it would make sense indeed to define an API for handling inputs, as you did for cell execution. Correct me if I'm wrong, but the way it is handled in jupyter-server-nbmodel doesn't support state recovery? For instance if I run a cell with a = input(), close my browser, and reopen it, I don't have the input widget anymore and my notebook is stuck (I can't run a cell anymore). For me this highlights the need for a better solution based on CRDTs.
Having an API for input handling would allow to provide a CRDT-based input widget.

krassowski · 2024-05-16T10:34:49Z

document_id sounds slightly better to me.

Edit: but I am fine with going forward with file_id. I think the difference is negligible.

davidbrochart · 2024-05-16T10:53:13Z

The issue with file_id is that:

it makes a reference to jupyter-server-fileid, which jupyter-ydoc is independent of.
the file ID is not enough to identify a shared document, because there could be multiple shared document types for a file (we also need the document type, for instance).

I'd like to not use room_id because it is a reference to pycrdt-websocket, which jupyter-ydoc is independent of.
So I'd prefer document_id or just id.

krassowski

This might be a jupyter_server_nbmodel issue but when executing a cell which takes longer I see that server gets stuck (execution counter does not get updated) and even static files (kernel icons) do not load:

Reloading the page is also stuck until the computation completes.

Calling input() also blocks everything for me, including execution in other notebooks. Is there maybe a requirement on specific version of jupyter-server or Python or something that is not defined in dependencies of jupyter_server_nbmodel?

Should we update documentation on server-side execution? I think it would be outdated once this PR is merged:

jupyter-collaboration/docs/source/configuration.md

Lines 33 to 49 in adb51b0

    
           There is an experimental feature that is currently only supported by the 
        
           [Jupyverse](https://github.com/jupyter-server/jupyverse) server 
        
           (not yet with [jupyter-server](https://github.com/jupyter-server/jupyter_server), 
        
           see the [issue #900](https://github.com/jupyter-server/jupyter_server/issues/900)): 
        
           server-side execution. With this, running notebook code cells is not done in the frontend through 
        
           the low-level kernel protocol over WebSocket API, but through a high-level REST API. Communication 
        
           with the kernel is then delegated to the server, and cell outputs are populated in the notebook 
        
           shared document. The frontend gets these outputs changes and shows them live. What this means is 
        
           that the notebook state can be recovered even if the frontend disconnects, because cell outputs are 
        
           not populated frontend-side but server-side. 
        
           This feature is disabled by default, and can be enabled like so: 
        
           ```bash 
        
           pip install "jupyterlab>=4.2.0b0" 
        
           pip install "jupyverse[jupyterlab, auth]>=0.4.2" 
        
           jupyverse --set kernels.require_yjs=true --set jupyterlab.server_side_execution=true 
        
           ```

If --YDocExtension.server_side_execution True is required, I think it would need to be documented. JupyterLab 4.2 and Notebook 7.2 could be mentioned as shared requirements for both jupyverse and the jupyter_server_nbmodel extension.

krassowski · 2024-05-16T10:47:14Z

packages/docprovider/src/notebookCellExecutor.ts

+        if (response.status === 300) {
+          let replyUrl = response.headers.get('Location') || '';
+
+          if (!replyUrl.startsWith(settings.baseUrl)) {
+            replyUrl = URLExt.join(settings.baseUrl, replyUrl);
+          }
+          const { parent_header, input_request } = await response.json();
+          // TODO only the client sending the snippet will be prompted for the input
+          // we can have a deadlock if its connection is lost.
+          const panel = new Panel();
+          panel.addClass('jp-OutputArea-child');
+          panel.addClass('jp-OutputArea-stdin-item');


There is some implicit assumptions on the jupyter-server-nbmodel implementation. I think this is acceptable but I would suggest that the code is re-organised:

can you move the stdin logic into a separate function and replace the if-else sequence with high-level switch-case to make the overall logic easy to understand at a first glance?

the Stdin creation largely duplicates code from core JupyterLab. Ideally we would be able to reduce the duplication as much as possible; the difference I think is that the core is built around the kernel future objects. I wonder if we can implement the future here and just pass it to the logic in JupyterLab core. This could be done in a follow-up PR if we need to make changes in core to enable this, but lets discuss/investigate a bit before merging.

krassowski · 2024-05-16T10:52:37Z

packages/docprovider/src/notebookCellExecutor.ts

+  }
+}
+
+async function requestServer(


so that it works with https://github.com/datalayer/jupyter-server-nbmodel, which is a third-party extension

@echarles @fcollonval what are your plans with respect to jupyter-server-nbmodel?

There are other ways to support inputs and widgets

I think that stdin box is so critical to the basic interface (as compared to ipywidgets) and different from other widgets that a special-case implementation make sense, assuming that it can work with page refresh. The implementation proposed in this PR does not work with page refresh, and the selling point of server-side execution is that the state is preserved even when the browser gets closed/disconnected.

packages/docprovider/src/notebookCellExecutor.ts

fcollonval · 2024-05-16T13:18:36Z

Thank you both @davidbrochart and @krassowski for testing and reviewing.

When thinking at this with @echarles , we wanted to address primary jupyter-server/jupyter_server#900 while keeping the kernel and the document model as decouple as possible.

The concern we have with adding the input or the pending requests within the document model is that it creates an issue with synchronizing the kernel state and the document model. In particular for the input, it seems to us that this will be very complex to resolve if the kernel is connected to a notebook and a console (or to multiple notebooks). For example, if the input snippet is sent from the console. Should the pending input kernel state be reflected in the notebook?

I'm wondering if a better path preserving as much as possible the decoupling by requesting the kernel state could be possible.

davidbrochart · 2024-05-16T13:34:32Z

I agree that adding requests to the shared document is the beginning of the kernel protocol leaking into CRDTs, which is exactly what we want to avoid.
For me an input request should just be added as a new entry in the cell outputs. Its output_type should (ironically) be input, and a frontend observing outputs should show an input box if it sees an output of type input. Note that we will probably need to rework the outputs structure, like I started in jupyter-server/jupyter_ydoc#201.

EDIT: let's call the output_type stdin instead of input, which would be confusing.

krassowski · 2024-05-16T15:34:37Z

A few times I encountered an issue when I executed a cell but the previous version of code was executed. This is because jupyter-server-nbmodel uses snippet = str(ycell["source"]) which can have a different state than the frontend that triggered the request (e.g. delayed by half a second and missing one character, e.g. leading to syntax error).

krassowski · 2024-05-16T15:38:07Z

Debugging this a little bit I see that the following lines in jupyter-server-nbmodel do not propagate to the frontend for me:

        with ycell.doc.transaction():
            del ycell["outputs"][:]
            ycell["execution_count"] = None

I see that ycell is defined, I am not sure why this does not get pushed to the frontend.

krassowski · 2024-05-16T15:48:44Z

Reloading the page is also stuck until the computation completes.

Calling input() also blocks everything for me, including execution in other notebooks. Is there maybe a requirement on specific version of jupyter-server or Python or something that is not defined in dependencies of jupyter_server_nbmodel?

Ah, I missed jupyter/jupyter_client#1023 - sorry!

echarles · 2024-05-16T15:55:36Z

Calling input() also blocks everything for me, including execution in other notebooks. Is there maybe a requirement on specific version of jupyter-server or Python or something that is not defined in dependencies of jupyter_server_nbmodel?

@krassowski Thx for trying. Do you still block with jupyter/jupyter_client#1023?

davidbrochart · 2024-05-16T15:57:59Z

I see that ycell is defined, I am not sure why this does not get pushed to the frontend.

I think you need jupyter-server/jupyter_ydoc#201.

BTW @fcollonval @krassowski I opened jupyter-server/jupyter_ydoc#233 to give an idea of what a truly collaborative and recoverable input widget would use.

krassowski · 2024-05-16T16:02:55Z

Do you still block with jupyter/jupyter_client#1023?

No, it is all working great after picking up jupyter/jupyter_client#1023. The bugs I now see are:

when using "restart and run all" the cells get executed out of order - IMO blocker before we ship this, already tracked in Cell execution order datalayer/jupyter-server-nbmodel#8
the cell code is not embedded in the request message which means that it can be out of sync (snippet = str(ycell["source"]) can be out of date by an update or two) - issue in the REST API design IMO and a blocker
empty cells get executed and increase cell counter (change of behaviour, jupyter-server-nbmodel issue)
the input() box is not restored on refresh (problem in this PR/API)
the execution count shows empty instead of "*" when executing (change of behaviour, jupyter-server-nbmodel issue)

Improve notebook cell server-side executor Fix for testing drive Allow to request a document from its document_id/room_id Add documentation Rename state room_id to document_id Don't include custom logic for jupyter_server_nbmodel

fcollonval · 2024-05-21T16:03:33Z

the latest pushed commit a207d33 reduced the changes to a minimum as a dedicated cell executor is going to be part of datalayer/jupyter-server-nbmodel#14

fcollonval added the bug Something isn't working label May 7, 2024

fcollonval marked this pull request as ready for review May 7, 2024 15:41

fcollonval mentioned this pull request May 7, 2024

New kernels REST API jupyter-server/jupyter_server#900

Open

fcollonval force-pushed the fix/server-execution-jp-server branch from 30eec10 to 536a7b8 Compare May 7, 2024 16:15

krassowski reviewed May 9, 2024

View reviewed changes

krassowski mentioned this pull request May 9, 2024

2.1.1 shows jupyterlab compatibility warning with jupyterlab 4.2 #309

Open

davidbrochart reviewed May 13, 2024

View reviewed changes

fcollonval mentioned this pull request May 13, 2024

Should code be executed asynchronously in RTC mode? datalayer/jupyter-server-nbmodel#7

Closed

fcollonval force-pushed the fix/server-execution-jp-server branch from 7c256ae to 7f5ec20 Compare May 15, 2024 09:27

fcollonval requested review from krassowski and davidbrochart May 15, 2024 09:33

davidbrochart reviewed May 15, 2024

View reviewed changes

krassowski requested changes May 16, 2024

View reviewed changes

Fix file id not available for jupyter-server

a207d33

Improve notebook cell server-side executor Fix for testing drive Allow to request a document from its document_id/room_id Add documentation Rename state room_id to document_id Don't include custom logic for jupyter_server_nbmodel

fcollonval force-pushed the fix/server-execution-jp-server branch from d73e386 to a207d33 Compare May 21, 2024 16:01

Zsailer mentioned this pull request May 23, 2024

Meeting Notes 2024 jupyter-server/team-compass#57

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for code execution on the Jupyter Server #307

Fix for code execution on the Jupyter Server #307

fcollonval commented May 7, 2024 •

edited

github-actions bot commented May 7, 2024

krassowski May 9, 2024

fcollonval May 13, 2024 •

edited

davidbrochart May 13, 2024

fcollonval May 13, 2024

fcollonval commented May 15, 2024

fcollonval commented May 15, 2024

davidbrochart commented May 15, 2024

davidbrochart commented May 15, 2024

davidbrochart commented May 15, 2024

davidbrochart May 15, 2024

krassowski May 16, 2024

fcollonval May 16, 2024

davidbrochart May 16, 2024

krassowski commented May 16, 2024 •

edited

davidbrochart commented May 16, 2024

krassowski left a comment

krassowski May 16, 2024

krassowski May 16, 2024

fcollonval commented May 16, 2024

davidbrochart commented May 16, 2024 •

edited

krassowski commented May 16, 2024

krassowski commented May 16, 2024

krassowski commented May 16, 2024

echarles commented May 16, 2024

davidbrochart commented May 16, 2024

krassowski commented May 16, 2024

fcollonval commented May 21, 2024

	const session = await requestDocSession(
	this._format,
	this._contentType,
	this._path
	);

	this._yWebsocketProvider = new YWebsocketProvider(
	this._serverUrl,
	`${session.format}:${session.type}:${session.fileId}`,

	@web.authenticated
	@authorized
	async def put(self, path):
	"""
	Creates a new session for a given document or returns an existing one.
	"""
	body = json.loads(self.request.body)
	format = body["format"]
	content_type = body["type"]
	file_id_manager = self.settings["file_id_manager"]

	idx = file_id_manager.get_id(path)
	if idx is not None:
	# index already exists
	self.log.info("Request for Y document '%s' with room ID: %s", path, idx)
	data = json.dumps(
	{
	"format": format,
	"type": content_type,
	"fileId": idx,
	"sessionId": SERVER_SESSION,
	}
	)
	self.set_status(200)
	return self.finish(data)

	# try indexing
	idx = file_id_manager.index(path)
	if idx is None:
	# file does not exists
	raise web.HTTPError(404, f"File {path!r} does not exist")

	# index successfully created
	self.log.info("Request for Y document '%s' with room ID: %s", path, idx)
	data = json.dumps(
	{
	"format": format,
	"type": content_type,
	"fileId": idx,
	"sessionId": SERVER_SESSION,
	}
	)
	self.set_status(201)
	return self.finish(data)

	There is an experimental feature that is currently only supported by the
	[Jupyverse](https://github.com/jupyter-server/jupyverse) server
	(not yet with [jupyter-server](https://github.com/jupyter-server/jupyter_server),
	see the [issue #900](https://github.com/jupyter-server/jupyter_server/issues/900)):
	server-side execution. With this, running notebook code cells is not done in the frontend through
	the low-level kernel protocol over WebSocket API, but through a high-level REST API. Communication
	with the kernel is then delegated to the server, and cell outputs are populated in the notebook
	shared document. The frontend gets these outputs changes and shows them live. What this means is
	that the notebook state can be recovered even if the frontend disconnects, because cell outputs are
	not populated frontend-side but server-side.

	This feature is disabled by default, and can be enabled like so:
	```bash
	pip install "jupyterlab>=4.2.0b0"
	pip install "jupyverse[jupyterlab, auth]>=0.4.2"
	jupyverse --set kernels.require_yjs=true --set jupyterlab.server_side_execution=true
	```

Fix for code execution on the Jupyter Server #307

Are you sure you want to change the base?

Fix for code execution on the Jupyter Server #307

Conversation

fcollonval commented May 7, 2024 • edited

Code changes

github-actions bot commented May 7, 2024

Choose a reason for hiding this comment

fcollonval May 13, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fcollonval commented May 15, 2024

fcollonval commented May 15, 2024

davidbrochart commented May 15, 2024

davidbrochart commented May 15, 2024

davidbrochart commented May 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krassowski commented May 16, 2024 • edited

davidbrochart commented May 16, 2024

krassowski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fcollonval commented May 16, 2024

davidbrochart commented May 16, 2024 • edited

krassowski commented May 16, 2024

krassowski commented May 16, 2024

krassowski commented May 16, 2024

echarles commented May 16, 2024

davidbrochart commented May 16, 2024

krassowski commented May 16, 2024

fcollonval commented May 21, 2024

fcollonval commented May 7, 2024 •

edited

fcollonval May 13, 2024 •

edited

krassowski commented May 16, 2024 •

edited

davidbrochart commented May 16, 2024 •

edited