Make /suite requests async #1

kamoltat · 2023-05-31T20:06:13Z

Refer Zack's comment here: VallariAg#3 (review)

Current behavior: API waits for scheduling process to be completed and then sends a response, which can be too long of a response time and it might timeout.

Desired behavior: API immediately sends a response. Users can check their scheduling status in some queue.

zmc · 2023-11-16T23:43:50Z

@dikwickley FastAPI's background tasks run after the request is completed, so I'm not sure if they would work well here.

Thinking about this more, I'm not sure we even need a queue (though if we do, multiprocessing may be of use). This rough outline may be enough:

Receive request for long-running operation
1. Generate request ID (uuid)
2. Respond with endpoint that the client will use to check status of operation
3. To perform operation, spawn child process OR teuthology library call
4. When operation completes, store result "somewhere"
Receive request to check status
1. Operation complete? -> Return result
2. Incomplete? -> Send some other response

So a client like pulpito-ng can poll the appropriate endpoint. If we want or need to be fancier in the future, we can look to websockets.

As for the "somewhere" - this could be something as big as a postgres database, or as small as shelve.

VallariAg · 2023-11-20T15:24:31Z

@zmc that sounds perfect!

We are already using multiprocessing in t-api's scheduling (though only when send_logs=True in the request). It's used so we can collect logs in an isolated file for each run, for parallel requests. Code can be found here:

teuthology-api/src/teuthology_api/services/helpers.py

Line 27 in 573fa93

    
           teuthology_process = Process(target=_execute_with_logs, args=(func, args, log_file))

I think that it's necessary that we send logs to pulpito-ng as results - especially when 1) if the schedule fails, 2) when user has --dry-run enabled.
While teuthology.suite.main runs, logs are collected in files (with help of teuthology.setup_log_file). So instead of adding an extra storage like a database or even python shelve (and then copying logs from files to this), we can just have those log files as our stored results. It would make the process of debugging and clearing of old stored-up results easier. Also we won't need to maintain any "status" of the scheduling operation either - logs would be self explanatory.

The flow would look like:

Receive request for long-running operation
i. Generate request ID (uuid)
ii. Respond with endpoint that the client will use to check status of operation -> example suite/status/<run_name>/
iii. To perform operation, spawn child process OR teuthology library call -> stores result logs in <run_name>.log as the operation runs
iv. When operation completes, store result "somewhere" - already being stored in log files

Receive request to check status
i. Operation complete? -> read <run_name>.log and send it as response
ii. Incomplete? -> read <run_name>.log and send it as response

In case we do need to check the actual status of scheduling - we can query paddles about the run.

kamoltat added the feature New feature or request label May 31, 2023

VallariAg self-assigned this Aug 27, 2023

VallariAg removed their assignment Oct 31, 2023

dikwickley linked a pull request Nov 2, 2023 that will close this issue

Added flag to run task in background #29

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make /suite requests async #1

Make /suite requests async #1

kamoltat commented May 31, 2023

zmc commented Nov 16, 2023

VallariAg commented Nov 20, 2023

Make /suite requests async #1

Make /suite requests async #1

Comments

kamoltat commented May 31, 2023

zmc commented Nov 16, 2023

VallariAg commented Nov 20, 2023