Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add manager for tracking and terminating running server proxies #395

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

mahendrapaipuri
Copy link

@mahendrapaipuri mahendrapaipuri commented Apr 12, 2023

This PR addresses part of enhancements proposed in #383

Backend
Here we are adding a manager on the backend to keep track of running server proxy apps with assumptions:

  • Only managed server proxy apps are added to the manager. Unmanaged apps using Accept an unset command to proxy to an already started process (unmanaged process) #339 will not be added to manager as they cannot be controlled via JupyterLab frontend.
  • Once the simpervisor manages to start the proxy server app successfully, the metadata along with simpervisor process object is added to singleton manager object.
  • The manager is instantiated during loading of extension in __init__.py and added to ServerApp as a traitlet. This is inspired from the jupyter_lsp project. This singleton object is passed to API handlers and SupervisedProxyHandler.
  • If the user wishes to terminate a running server proxy app, the API call will effectively send a SIGTERM signal to running server proxy app by calling terminate method on simpervisor process.
  • In ensure_process method, we are verifying the state of the process using proc.running. When the running server proxy app is either terminated via JupyterLab frontend or if user terminates the app by sending a signal to PID manually proc.running will be evaluated to False. In these cases, we would like to ideally restart the process again and hence, we remove proc object from internal state and restart the process. _restart_process_if_needed will make sure simpervisor will restart the process in other cases.

Frontend
Manager for the frontend follows a similar logic used to implement terminals/kernels sessions in JupyterLab.

  • Frontend manager makes polling to API at a configurable intervals to get the list of running server proxy apps and add them to running sessions in "Server Proxy Apps" section.
    jsp_settings
  • By hovering over an active server proxy app, metadata of the process (command, port, managed) can be viewed.
  • User can shutdown a given proxy app or all proxy apps using shutdown option in the running sessions. Proxy apps that are shutdown can be restarted afresh from the launcher.

Tests

  • API tests are included to check success and failure cases
  • Robot tests are included that shows running servers and shutting down a running server (I dont have a lot of experience robot framework and I would appreciate any help if they can be improved further).

Docs

Note about the Running Server Proxy Apps is included in GUI launchers in docs

Illustration
ezgif-3-e8dcd463c7
or all server proxy apps which will terminate by sending SIGTERM as explained above.

Other changes

  • servers-info endpoint is moved from server-proxy/servers-info to server-proxy/api/servers-info to be consistent
  • jupyter_server is using root / and jupyterlab_server is using /lab/ directory for API endpoints. So, all API end points are moved to /server-proxy/ sub directory to avoid future incompatibility issues
  • An OpenAPI spec file is included in the repo to keep track of APIs and generate static docs
  • StaticFileHandler is used for IconHandler to simplify code

This extension can be ported to Notebook 7 as well with minimum changes.

This PR partially addresses #343, #264, #213, #188 (EDIT: marked as fixed, motivated below)

@consideRatio @ryanlovett @yuvipanda @bollwyvl Feel free to review and let me know your feedback!!

Edits by Erik

Copy link
Member

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mahendrapaipuri this looks great to me overall!!! ❤️ 🎉 🌻

I see a clear value in this feature, it also seems like a well thought out design and implementation of the design! I've not been able to review the JS parts thoroughly as I'm not experienced with that, but from what I can tell it looks great as well.

I appreciate a lot about this PR but I'd like to highlight that your PR description was amazing to help me review it, I also greatly appreciate that it included references to related issues.

Required work towards merge

Before doing anything to avoid merge conflicts etc, please rebase on the current main branch to acquire #393 and #396 that changes things in docs and testing infra.

  • Update docs to reflect that the labextension does more than just add launcher buttons, but that it also provides these new UI to manage servers
  • Add entry in the changelog under the unreleased title that the api endpoint has been renamed
  • Consider if we should print some log messages someplace.

Optional bonus work

I note that we don't have existing tests of for example the servers-info endpoint. Since its become a far bigger project to setup tests from scratch than to mimic existing tests, I don't think we should block merging of this PR as it is but instead open an issue to represent the need to develop tests of our backend API separately going onwards.

  • Add tests of introduced backend API handlers
    • GET api/server-proxy (to list)
    • GET api/server-proxy/<proxy name>, failure situation + success situation
    • DELETE api/server-proxy/<proxy name>, failure situation + success situation
  • Add browser / acceptance test of this feature

jupyter_server_proxy/api.py Outdated Show resolved Hide resolved
jupyter_server_proxy/manager.py Outdated Show resolved Hide resolved
jupyter_server_proxy/handlers.py Outdated Show resolved Hide resolved
jupyter_server_proxy/handlers.py Outdated Show resolved Hide resolved
Comment on lines 65 to 84
icon_handlers.append(
(
ujoin(base_url, f"server-proxy/icon/{sp.name}"),
IconHandler,
{"path": sp.launcher_entry.icon_path},
)
)

nbapp.web_app.add_handlers(
".*",
[
(
ujoin(base_url, "server-proxy/servers-info"),
ujoin(base_url, "api/server-proxy/servers-info"),
ServersInfoHandler,
{"server_processes": server_processes},
),
(ujoin(base_url, "server-proxy/icon/(.*)"), IconHandler, {"icons": icons}),
],
(ujoin(base_url, r"api/server-proxy"), ListServersAPIHandler),
(ujoin(base_url, r"api/server-proxy/(?P<name>.*)"), ServersAPIHandler),
]
+ icon_handlers,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reflected a bit about the API paths we are locking into providing going onwards, and if we should adjust them or not in some way.

Specifically, reflection I reflected about:

  • Multiple handlers register to handle certain routes now, for example servers-info could be handled by ServersInfoHandler and ListServersAPIHandler, and icon/<existing-server-process> could be a ServersAPIHandler if its named quirky.
    I think this can be fine, but I'm not fully confident. Should we work to ensure there isn't overlapping here?

I've not gained much experience on REST API design, I've mostly understood there can be quite a bit of things to consider. @bollwyvl @manics do you think we should go for the proposed API endpoints, or do you have suggestions on alternative endpoints?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welp, declaring (and testing against) an OpenAPI spec would be a good place to start, so that one doesn't have to read the code to figure out what's going on. Prior art:

This can generate nice static docs, as well as help cleaning up some of the design decisions: if you can't express a pattern as a URL param or in a schema, maybe it's not a great idea.

It also then becomes possible to re-use that schema declaration to generate lightweight python and TS types, even if one doesn't do fully generated clients.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with OpenAPI suggestion. That would make it more cleaner. Actually, servers-info endpoint is only handling "launcher" specific metadata whereas the new endpoints are handling server specific metadata. So, I think it is logical to keep them as separate. However, I think we need to change the name of servers-info endpoint as it is bit misleading? ServersAPIHandler and ListServersAPIHandler can be merged into one using query parameters.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bollwyvl Cheers for the refs

Comment on lines 122 to 127
# Create a default manager to keep track of server proxy apps.
manager = ServerProxyAppManager()

# Create a Periodic call back function to check the status of processes
pc = PeriodicCallback(monitor_server_proxy_procs, 1e4)
pc.start()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be reasonable, but I'm not confident on it. Is it acceptable that we start this up automatically as soon as this file is loaded?

I'm not confident on the alternate options, but I figure this is a singleton instance of the class defined above, created when this python file is loaded. I figure options involve delaying this initialization.

I'm also thinking a bit about if monitor_server_proxy_procs should or shouldn't be part of the class itself. It is a function tied to a specific object of the class anyhow via the manager variable that could be self as well.

If not someone has a concrete change idea here to persue or generally considers this to be blocking, I don't think this comment should be blocking a merge.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Importing this file shouldn't have side-effects like starting/changing processes or even things on the event loop, as there's no telling it will actually be there.

Ideally, as much business logic/exception handling as possible would move into the manager class, and out of the handlers and __init__.py. The server extension would then become a very small layer that just hands down the parent application into the manager.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'd recommend new-start async stuff to use stdlib asyncio or anyio primitives instead of tornado-specific ones. And indeed, if we could get out of the polling business, that would be ideal: the underlying processes probably do (or should) provide callbacks/events that can be used to directly update the manager state.

Assuming we do have to poll, for some reason, and as I don't actually see a drop-in for PeriodicCallback, this could be a ServerProxyManager.monitor() which fires off an infinite loop:

import traitlets
import asyncio 

class ServerProxyManager(ServerProxy):
    monitor_interval = traitlets.Int(10, help="Proxy polling interval in seconds").tag(config=True)
    monitoring = traitlets.Bool(False, help="Whether the manager should poll for proxy status")

    # this doesn't need to be a trait
    _monitor_task: asyncio.Task

    @traitlets.observe("monitoring")
    def on_monitoring_changed(self, change: traitlets.Bunch):
        if self._monitor_task:
            self._monitoring_task.cancel()
        if change.new:
            # keep a handle so it doesn't get GCd
            self._monitoring_task = asyncio.create_task(self._monitor_loop()) 

    async def _monitor_loop(self):
        while self.monitoring:
             # do the thing
             await asyncio.sleep(self.monitor_interval)

When this gets invoked from __init__.py:

def _load_jupyter_server_extension(nbapp: ServerApp):
    manager = ServerProxyManager(parent=nbapp)
    manager.monitor()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user gracefully terminates a server proxy manually outside of lab interface, that process will not be restarted as we are not using always_restart feature of simpervisor. So somehow we need to update manager to remove this server proxy app and that is why I included this polling. Simpervisor uses exit handlers internally but I think they are not configurable. So, I dont know how we can fire a callback when process exits gracefully.

labextension/src/index.ts Outdated Show resolved Hide resolved
labextension/src/index.ts Outdated Show resolved Hide resolved
labextension/src/index.ts Outdated Show resolved Hide resolved
labextension/package.json Outdated Show resolved Hide resolved
@bollwyvl
Copy link
Collaborator

This is a reasonable starting point, but would benefit from aligning with the configuration and async patterns used in the upstream jupyter_server, as well as siblings such as jupyterlab_server and jupyter_lsp.

I'd say a lack of any python tests for the new features is probably a hard blocker to this getting merged, but really, with the more complex lab stuff, this would need to be tested by robot tests as well to be seriously considered for merging.

As noted, in general when adding a new REST APIs, starting with the schema and generating the python and TS types is a really good way to keep the system manageable, over time, and whether built with OpenAPI or not, can make the documentation process also much smoother.

@mahendrapaipuri
Copy link
Author

@consideRatio @bollwyvl Thanks for the feedback. I will work on the changes. Yes, I agree that we need to add tests for these features and I am working on them already. Next week I will be at KubeCon so, I might not find lot of time to work on this.

Are you aiming to make this extension compatible with Notebook 7? Currently the frontend extension fails as it is loading JupyterLab specific components. Straightforward solution is to load UI components conditionally, i.e., making ILauncher optional. Adding an optional Notebook related component (that add servers to Notebook launcher) to the extension should make it work for both Lab and Notebook interfaces. @bollwyvl Is it the strategy being followed now? I could not find a lot of info about extensions in new Notebook 7?

mahendrapaipuri and others added 10 commits April 27, 2023 14:57
It keeps simpervisor proc objects in internal state to be able to teminate
a process by calling terminate method on proc object.

It also checks the status of active processes by running a periodic
callback. If a process is killed outside of jsp, it removes it from
running apps list.
Existing servers-info endpoint is moved with api sub directory to have
a consistent end point scheme.
We also check if proxy is "running" and if it is not, we remove proc
object to restart the proxy. This way proxy will be restarted only when user
does it via lancher.
This manager will be added to running sessions widget so that user can
view and terminate running server proxy apps.

Proxy app metadata will be shown when we hover over proxy app name.
@mahendrapaipuri
Copy link
Author

mahendrapaipuri commented Apr 28, 2023

@consideRatio @bollwyvl @manics I have update the PR with most of the comments addressed. Please check the updated PR description for more details.

Update: Cleaned up my mess and git history is clean now. Sorry about that!!

mahendrapaipuri and others added 12 commits April 28, 2023 11:56
Do not instantiate manager in the file anymore. This will be done
during extension loading

Rework monitoring for proxy apps natively using asyncio. This will be
added as callback to ServerApp IO loop during extension loading

Manager spits out logs in debug mode when there is proxy app is
added/removed from it.

Added uni_socket to server proxy app dataclass

Remove unnecessary async methods

Properly handle the get_server_proxy_{app,proc} methods when server
proxy is not found in the manager. We return a default ServerProxyApp
tuple in this case with empty values
An OpenAPI spec file is added to keep track of APIs for better
maintainability

All APIs are moved into /server-proxy/ subdirectory. Seems like root /
is used by jupyter_server and /lab/ is used by jupyterlab_server.
Moving all APIs under /server-proxy will future incompatibilities

Initialise ServersAPIHandler with manager instead of importing it

Use a function to setup API handlers which can be called directly during
extension loading
We add manager as a traitlet to ServerApp and add monitor as a callback
to ServerApp IO lopp

Simplify API handlers setup
Move RunningServerProxyApp to a separate file

Use lab transalation capabilities for complex strings

Use URLExt package to manipulate URLs
Use lumino polling class for polling server proxy apps

Add schema to be able to configure refreshInterval
@mahendrapaipuri
Copy link
Author

@consideRatio I have reworked this PR based on the comments and added tests. Please check the updated PR description when you got time.

@consideRatio
Copy link
Member

This is amazing work @mahendrapaipuri!!!

I'm truly sorry we've not been able to review your work further yet @mahendrapaipuri. I'll do a small push now but I can't review this myself to completion.

As this PR is very large, avoid force pushing to the PR branch to help reviewers keep track of changes since last review better. I'll go through a few comments and link to the resolving commit to help other reviewers. I'd like to get the open review comments down to a minimum to reduce complexity of further review.

@mahendrapaipuri
Copy link
Author

Thanks a lot @consideRatio for reviewing the changes. It is totally fine, we all are busy with a lot of stuff and I understand things can take time.

I guess there are still few small things to address and I will do it next week. Cheers!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants