Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async release channel #1512

Draft
wants to merge 30 commits into
base: main
Choose a base branch
from
Draft

async release channel #1512

wants to merge 30 commits into from

Conversation

technillogue
Copy link
Contributor

this PR tracks the async release channel for the 0.10.0 release

Signed-off-by: technillogue <technillogue@gmail.com>
technillogue and others added 8 commits February 21, 2024 16:16
* have runner return asyncio.Task instead of AsyncFuture
* make tests async and fix them
* delete remaining runner thread code :)
* review changes to tests and server

(reverts commit 828eee9)

Signed-off-by: technillogue <technillogue@gmail.com>
this is the first step towards supporting continuous batching and concurrent predictions. eventually, you will be configure it so your predict function will be called concurrently

* bare minimum to support async predict
* add async tests

Signed-off-by: technillogue <technillogue@gmail.com>
* conditionally create the event loop if predictor is async, and add a path for hypothetical async setup
* don't use async for predict loop if predict is not async
* add test cases for shared loop across setup and predict + asyncio.run in setup

(reverts commit b533c6b)

Signed-off-by: technillogue <technillogue@gmail.com>
Signed-off-by: technillogue <technillogue@gmail.com>
* async Worker._wait and its consequences
* AsyncPipe so that we can process idempotent endpoint and cancellation rather than _wait blocking the event loop
* test_prediction_cancel can be flaky on some machines
* separate _process_list to be less surprising than isasyncgen
* sleep wasn't needed
* suggestions from review
* suggestions from review
* even more suggestions from review

---------

Signed-off-by: technillogue <technillogue@gmail.com>
Co-authored-by: Nick Stenning <nick@whiteink.com>
Signed-off-by: technillogue <technillogue@gmail.com>
Signed-off-by: technillogue <technillogue@gmail.com>
* race utility for racing awaitables
* start mux, tag events with id, read pipe in a task, get events from mux
* use async pipe for async child loop
* _shutting_down vs _terminating
* race with shutdown event
* keep reading events during shutdown, but call terminate after the last Done
* emit heartbeats from mux.read
* don't use _wait. instead, setup reads event from the mux too
* worker semaphore and prediction ctx
* where _wait used to raise a fatal error, have _read_events set an error on Mux, and then Mux.read can raise the error in the right context. otherwise, the exception is stuck in a task and doesn't propagate correctly
* fix event loop errors for <3.9
* keep track of predictions in flight explicitly and use that to route logs
* don't wait for executor shutdown
* progress: check for cancelation in task done_handler
* let mux check if child is alive and set mux shutdown after leaving read event loop
* close pipe when exiting
* predict requires IDLE or PROCESSING
* try adding a BUSY state distinct from PROCESSING when we no longer have capacity
* move resetting events to setup() instead of _read_events()

previously this was in _read_events because it's a coroutine that will have the correct event loop. however, _read_events actually gets created in a task, which can run *after* the first mux.read call by setup. since setup is now the first async entrypoint in worker and in tests, we can safely move it there

* state_from_predictions_in_flight instead of checking the value of semaphore
* make prediction_ctx "private"

Signed-off-by: technillogue <technillogue@gmail.com>
* add concurrency to config
* this basically works!
* more descriptive names for predict functions
* maybe pass through prediction id and try to make cancelation do both?
* don't cancel from signal handler if a loop is running. expose worker busy state to runner
* move handle_event_stream to PredictionEventHandler
* make setup and canceling work
* drop some checks around cancelation
* try out eager_predict_state_change
* keep track of multiple runner prediction tasks to make idempotent endpoint return the same result and fix tests somewhat
* fix idempotent tests
* fix remaining errors?
* worker predict_generator shouldn't be eager
* wip: make the stuff that handles events and sends webhooks etc async
* drop Runner._result
* drop comments
* inline client code
* get started
* inline webhooks
* move clients into runner, switch to httpx, move create_event_handler into runner
* add some comments
* more notes
* rip out webhooks and most of files and put them in a new ClientManager that handles most of everything. inline upload_files for that
* move create_event_handler into PredictionEventHandler.__init__
* fix one test
* break out Path.validate into value_to_path and inline get_filename and File.validate
* split out URLPath into BackwardsCompatibleDataURLTempFilePath and URLThatCanBeConvertedToPath with the download part of URLFile inlined
* let's make DataURLTempFilePath also use convert and move value_to_path back to Path.validate
* use httpx for downloading input urls and follow redirects
* take get_filename back out for tests
* don't upload in http and delete cog/files.py
* drop should_cancel
* prediction->request
* split up predict/inner/prediction_ctx into enter_predict/exit_predict/prediction_ctx/inner_async_predict/predict/good_predict as one way to do it. however, exposing all of those for runner predict enter/coro exit still sucks, but this is still an improvement
* bigish change: inline predict_and_handle_errors
* inline make_error_handler into setup
* move runner.setup into runner.Runner.setup
* add concurrency to config in go
* try explicitly using prediction_ctx __enter__ and __exit__
* make runner setup more correct and marginally better
* fix a few tests
* notes
* wip ClientManager.convert
* relax setup argument requirement to str
* glom worker into runner
* add logging message
* fix prediction retry and improve logging
* split out handle_event
* use CURL_CA_BUNDLE for file upload
* clean up comments
* dubious upload fix
* small fixes
* attempt to add context logging?
* tweak names
* fix error for predictionOutputType(multi=False)
* improve comments
* fix lints
* skip worker and webhook tests since those were erroring on removed imports. fix or xfail runner tests
* upload in http instead of PredictionEventHandler. this makes tests pass and fixes some problems with validation, but also prevents streaming files and causes new problems. also xfail all the responses tests that need to be replaced with respx
* format
* fix some new-style type signatures and drop 3.8 support
* drop 3.7 in code

Signed-off-by: technillogue <technillogue@gmail.com>
technillogue and others added 3 commits February 21, 2024 18:51
This reverts commit 335f67b.

Signed-off-by: technillogue <technillogue@gmail.com>
* input downloads, output uploads, and webhooks are now handled by ClientManager, which persists for the lifetime of runner, allowing us to reuse connections, which may significantly help with large uploads.
* although I was originally going to drop output_file_prefix, it's not actually hard to maintain. the behavior is changed now and objects are uploaded as soon as they're outputted rather than after the prediction is completed.
* there's an ugly hack with uploading an empty body to get the redirect instead of making api time out from trying to upload an 140GB file. that can be fixed by implemented an MPU endpoint and/or a "fetch upload url" endpoint.
* the behavior of the non-indempotent endpoint is changed; the id is now randomly generated if it's not provided in the body. this isn't strictly required for this change alone, but is hard to carve out.
* the behavior of Path is changed significantly. see https://www.notion.so/replicate/Cog-Setup-Path-Problem-2fc41d40bcaf47579ccd8b2f4c71ee24

Signed-off-by: technillogue <technillogue@gmail.com>
Co-authored-by: Mattt <mattt@replicate.com>
Signed-off-by: technillogue <technillogue@gmail.com>
@technillogue
Copy link
Contributor Author

steps to carve up:

  • drop race and add connection (easy?)
  • runner refactor. I still want to reconsider how radical of a change I need to do to here, I don't know how to proceed
  • allow concurrency

* wip

* some tweaks

* ignore some type errors

* test connection roundtrip

* add notes from python source code

Signed-off-by: technillogue <technillogue@gmail.com>
Signed-off-by: technillogue <technillogue@gmail.com>
* optimize webhook serialization and logging
* optimize logging by binding structlog proxies
* fix tests

---------

Signed-off-by: technillogue <technillogue@gmail.com>
Signed-off-by: technillogue <technillogue@gmail.com>
technillogue and others added 14 commits May 16, 2024 17:08
* add concurrency to config

* more descriptive names for predict functions

* don't cancel from signal handler if a loop is running. expose worker busy state to runner

* move handle_event_stream to PredictionEventHandler

* make setup and canceling work

* keep track of multiple runner prediction tasks to make idempotent endpoint return the same result and fix tests somewhat

* drop Runner._result, comments

* move create_event_handler into PredictionEventHandler.__init__

* break out Path.validate into value_to_path and inline get_filename and File.validate

* split out URLPath into BackwardsCompatibleDataURLTempFilePath and URLThatCanBeConvertedToPath with the download part of URLFile inlined

* let's make DataURLTempFilePath also use convert and move value_to_path back to Path.validate

* drop should_cancel

* prediction->request

* split up predict/inner/prediction_ctx into enter_predict/exit_predict/prediction_ctx/inner_async_predict/predict/good_predict as one way to do it. however, exposing all of those for runner predict enter/coro exit still sucks, but this is still an improvement

* bigish change: inline predict_and_handle_errors

* inline make_error_handler into setup

* move runner.setup into runner.Runner.setup

* add concurrency to config in go

* try explicitly using prediction_ctx __enter__ and __exit__

* relax setup argument requirement to str

* glom worker into runner

* add logging message

* fix prediction retry and improve logging

* split out handle_event

* use CURL_CA_BUNDLE for file upload

* dubious upload fix

* skip worker and webhook tests since those were erroring on removed imports. fix or xfail runner tests

* validate prediction response to raise errors, but return the unvalidated output to avoid converting urls to File/Path

* expose concurrency in healthcheck

* mediocre logging that works like print

* COG_DISABLE_CANCEL to ignore cancelations

* COG_CONCURRENCY_OVERRIDE

* add ready probe as an http route

* encode webhooks only after knowing they will be sent, and bail our of upload type checks early for strs

* don't validate outputs

* add AsyncConcatenateIterator

* should_exit is not actually used by http

* format

* codecov

* describe the remaining problems with this PR and add comments about cancelation and validation

* add a test

---------
Signed-off-by: technillogue <technillogue@gmail.com>
Co-authored-by: Mattt <mattt@replicate.com>
Signed-off-by: technillogue <technillogue@gmail.com>
predict_time_share tracks the portion of the worker's processing time that was dedicated to each individual prediction. the "cost" of each second is split across the predictions running during that second.

Signed-off-by: technillogue <technillogue@gmail.com>
Co-authored-by: Zeke Sikelianos <zeke@sikelianos.com>
* function to emit metrics
* add metrics docs

---------

Signed-off-by: technillogue <technillogue@gmail.com>
Signed-off-by: technillogue <technillogue@gmail.com>
…ok (#1683)

Signed-off-by: technillogue <technillogue@gmail.com>
Signed-off-by: technillogue <technillogue@gmail.com>
Signed-off-by: technillogue <technillogue@gmail.com>
Signed-off-by: Mattt Zmuda <mattt@replicate.com>
* Update code actions on save

Signed-off-by: Mattt Zmuda <mattt@replicate.com>

* Update setting key to ruff.lint.args

Signed-off-by: Mattt Zmuda <mattt@replicate.com>

* Remove python.linting settings

All settings starting with "python.linting." are deprecated and can be removed from settings.

Signed-off-by: Mattt Zmuda <mattt@replicate.com>

* Use vscode.json-language-features to format JSON

Signed-off-by: Mattt Zmuda <mattt@replicate.com>

* Remove prettier and black from recommended extensions

Signed-off-by: Mattt Zmuda <mattt@replicate.com>

---------

Signed-off-by: Mattt Zmuda <mattt@replicate.com>
* Define Secret type (#1546)

Signed-off-by: Mattt Zmuda <mattt@replicate.com>

* Fix linter errors (#1691)

* ruff --fix

Signed-off-by: Mattt Zmuda <mattt@replicate.com>

* Update ruff pyproject settings

Signed-off-by: Mattt Zmuda <mattt@replicate.com>

* Update ruff lint command in Makefile

Signed-off-by: Mattt Zmuda <mattt@replicate.com>

---------

Signed-off-by: Mattt Zmuda <mattt@replicate.com>

* Run ruff --fix python

Signed-off-by: Mattt Zmuda <mattt@replicate.com>

---------

Signed-off-by: Mattt Zmuda <mattt@replicate.com>
Signed-off-by: Mattt Zmuda <mattt@replicate.com>
…nal (#1714)

Signed-off-by: technillogue <technillogue@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants