Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify YDocWebSocketHandler #247

Open
dlqqq opened this issue Mar 12, 2024 · 1 comment
Open

Simplify YDocWebSocketHandler #247

dlqqq opened this issue Mar 12, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@dlqqq
Copy link
Collaborator

dlqqq commented Mar 12, 2024

Brief description

The YDocWebSocketHandler can be dramatically simplified and reduced in complexity. We may also be able to remove the dependency on pycrdt-websocket.

Context: The Yjs protocol

Here I present a brief overview of the Yjs protocol for context.

Within the Yjs protocol, there are two protocol layers:

  • Sync protocol layer -- handles synchronization of real-time document contents (i.e. the Ydoc object)
  • Awareness protocol layer -- handles synchronization of real-time collaborator state (e.g. cursor position)

Since jupyter_collaboration isn't a document peer itself, it doesn't need to use the awareness protocol layer, we just need to broadcast awareness messages to all peers when we receive one.

The sync protocol, in turn, has 3 message types:

  • Sync step 1 (SS1): A message that both 1) retrieves missing updates from another peer, and 2) instructs the other peer to stream all of their updates to this peer.
    • Sending a SS1 message to a peer is analogous to subscribing to that peer.
  • Sync step 2 (SS2): A message containing missing updates for a peer, as a reply to a SS1 message.
  • Sync update (SU): A message containing a transaction committed by a peer, after a connection was established with a SS1 + SS2 handshake.

Suppose we have the simplest Yjs network with a single node named node A. Suppose node B connects to this network. Here is how the connection should be established according to the sync protocol layer:

  • Node B sends a SS1 message.
  • Node A receives the SS1 message, and replies with a SS2 message. Node A then sends a SS1 message as well to subscribe to their updates.
    • When Node B handles the SS2 message, this completes the subscription of Node B to Node A.
  • Node B receives the SS1 message, and replies with a SS2 message.
  • Node A receives the SS2 message.
    • When Node A handles the SS2 message, this completes the subscription of Node A to Node B.

After the Yjs connection is established through this process, Node A and Node B then stream SU messages to one another to synchronize document state.

Motivation and impact

The minimum requirements of YDocWebSocketHandler are fairly concise:

  • Load the Ydoc in memory based on some ID in the URL path (currently room ID)
  • On receiving SS1 messages:
    • Reply with a SS2 message, then send a SS1 message.
  • On receiving SS2 or SU messages:
    • Apply it to the Ydoc, then broadcast to all peers except the sender
  • On receiving awareness messages:
    • Inspect the awareness message to know which clients have joined / left the network (current implementation).
    • Broadcast to all peers except the sender.

However, the current handler implementation has some issues:

  • The handling of these Yjs messages is split across two repos and mostly defined in pycrdt-websocket.
  • The handling of these Yjs messages can mostly be done in the Tornado handler class directly.
  • jupyter_collaboration relies on a YRoom abstraction provided by pycrdt-websocket, which can be simplified as well.
  • The code relies on a lot of async abstractions (some of them provided by anyio), but the minimum requirements of the Yjs protocol do not seem to require them.

Unifying the source code in a single repo makes releasing this software much easier. This, along with reducing the complexity of the source, will allow for others to contribute to this project more easily. :hugging_face:

Proposal

I will open a draft PR that proposes a simplified implementation. I will focus on the following areas:

  • Remove the dependency on pycrdt-websocket, merging existing logic into the YDocWebSocketHandler directly
  • Simplify async code
  • Suggest a more intuitive interface for these classes
@dlqqq dlqqq added the enhancement New feature or request label Mar 12, 2024
@davidbrochart
Copy link
Collaborator

Since jupyter_collaboration isn't a document peer itself, it doesn't need to use the awareness protocol layer, we just need to broadcast awareness messages to all peers when we receive one.

I disagree, the backend is just another CRDT peer. But it's also an authority, and it should not blindly broadcast awareness messages. Imagine a client that changes their name in the awareness, for instance a student using the teacher's name, this should not be allowed. The backend has the authority to not accept these awareness messages if the awareness name doesn't match the authenticated user.
There is some logic that I plan to activate in the future.

On receiving SS2 or SU messages:

  • Apply it to the Ydoc, then broadcast to all peers except the sender

Wrong, updates must be echoed to the sender as well.

On receiving awareness messages:

  • Inspect the awareness message to know which clients have joined / left the network (current implementation).
  • Broadcast to all peers except the sender.

Same comment.

Unifying the source code in a single repo makes releasing this software much easier.

pycrdt-websocket is and should remain Jupyter-agnostic. It is the Python equivalent to y-websocket. I've not encountered any issue related to releasing.

I will open a draft PR that proposes a simplified implementation. I will focus on the following areas:

  • Remove the dependency on pycrdt-websocket, merging existing logic into the YDocWebSocketHandler directly

I suggest not spending time on this, as I will likely not accept this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants