Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make /suite requests async #1

Open
kamoltat opened this issue May 31, 2023 · 2 comments · May be fixed by #29
Open

Make /suite requests async #1

kamoltat opened this issue May 31, 2023 · 2 comments · May be fixed by #29
Labels
feature New feature or request

Comments

@kamoltat
Copy link
Member

Refer Zack's comment here: VallariAg#3 (review)

Current behavior: API waits for scheduling process to be completed and then sends a response, which can be too long of a response time and it might timeout.

Desired behavior: API immediately sends a response. Users can check their scheduling status in some queue.

@kamoltat kamoltat added the feature New feature or request label May 31, 2023
@VallariAg VallariAg self-assigned this Aug 27, 2023
@VallariAg VallariAg removed their assignment Oct 31, 2023
@dikwickley dikwickley linked a pull request Nov 2, 2023 that will close this issue
@zmc
Copy link
Member

zmc commented Nov 16, 2023

@dikwickley FastAPI's background tasks run after the request is completed, so I'm not sure if they would work well here.

Thinking about this more, I'm not sure we even need a queue (though if we do, multiprocessing may be of use). This rough outline may be enough:

  1. Receive request for long-running operation
    1. Generate request ID (uuid)
    2. Respond with endpoint that the client will use to check status of operation
    3. To perform operation, spawn child process OR teuthology library call
    4. When operation completes, store result "somewhere"
  2. Receive request to check status
    1. Operation complete? -> Return result
    2. Incomplete? -> Send some other response

So a client like pulpito-ng can poll the appropriate endpoint. If we want or need to be fancier in the future, we can look to websockets.

As for the "somewhere" - this could be something as big as a postgres database, or as small as shelve.

@VallariAg
Copy link
Member

@zmc that sounds perfect!

We are already using multiprocessing in t-api's scheduling (though only when send_logs=True in the request). It's used so we can collect logs in an isolated file for each run, for parallel requests. Code can be found here:

teuthology_process = Process(target=_execute_with_logs, args=(func, args, log_file))

I think that it's necessary that we send logs to pulpito-ng as results - especially when 1) if the schedule fails, 2) when user has --dry-run enabled.
While teuthology.suite.main runs, logs are collected in files (with help of teuthology.setup_log_file). So instead of adding an extra storage like a database or even python shelve (and then copying logs from files to this), we can just have those log files as our stored results. It would make the process of debugging and clearing of old stored-up results easier. Also we won't need to maintain any "status" of the scheduling operation either - logs would be self explanatory.

The flow would look like:

  1. Receive request for long-running operation
    i. Generate request ID (uuid)
    ii. Respond with endpoint that the client will use to check status of operation -> example suite/status/<run_name>/
    iii. To perform operation, spawn child process OR teuthology library call -> stores result logs in <run_name>.log as the operation runs
    iv. When operation completes, store result "somewhere" - already being stored in log files
  2. Receive request to check status
    i. Operation complete? -> read <run_name>.log and send it as response
    ii. Incomplete? -> read <run_name>.log and send it as response

In case we do need to check the actual status of scheduling - we can query paddles about the run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
Development

Successfully merging a pull request may close this issue.

3 participants