-
Notifications
You must be signed in to change notification settings - Fork 545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the possibility to push tasks to swarming #3996
base: master
Are you sure you want to change the base?
Conversation
@@ -0,0 +1,1177 @@ | |||
// Copyright 2022 The LUCI Authors. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copied over from luci code, don't know what the procedure is here wrt copyright and license?
@jonathanmetzman
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will we keep this file up to date? Can we have a third-party dependency on the luci repo instead, somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably needs to go in protos/third_party/.
Also, how will this be kept up to date?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I talked to swarming folks, they don't expect this will change, and even if it changes, changes will be backward compatible, meaning we wouldn't need to update this file (at least not often).
I trimmed down the file to only the parts necessary for a NewTaskRequest message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Can we add a monitoring metric (maybe in a followup PR) to record the rate of successful and unsuccessful task pushes? It would be good to know in real time if we start to fail to schedule tasks due to a change in the swarming API, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I can add that in a follow up pr!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking pretty good! Mostly code-level comments, with one bigger question around how to handle the third-party dependency on luci protos.
@@ -0,0 +1,1177 @@ | |||
// Copyright 2022 The LUCI Authors. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will we keep this file up to date? Can we have a third-party dependency on the luci repo instead, somehow?
def _get_job(job_name: str): | ||
"""Returns the Job entity named by |job_name|. This function was made to make | ||
mocking easier.""" | ||
return data_types.Job.query(data_types.Job.name == job_name).get() | ||
|
||
|
||
def _get_swarming_config(): | ||
"""Returns the swarming config. This function was made to make mocking easier.""" | ||
return local_config.SwarmingConfig() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mocking where?
For this, I recommend injecting mocks into a class instead of mocking free functions in a module. Consider the following design:
QueryJobFunc = Callable[[str], data_types.Job]
def _real_query_job(str) -> data_types.Job:
# query ndb
class SwarmingClient:
def __init__(
self,
config: Optional[local_config.SwarmingConfig] = None,
query_job: Optional[QueryJobFunc] = None):
self._config = config or local_config.SwarmingConfig()
self._query_job = query_job or _real_query_job
def push_task(command, download_url, job_type):
# ...
Then tests can inject mocks into the swarming client, and inject the swarming client wherever it is needed.
Are we going to be used the docker image in swarming? |
Yes, I probably should allow specifying the image from the config similar to batch. I will do the necessary change |
# forcibly terminated and the task results in TIMED_OUT. | ||
execution_timeout_secs = instance_spec['execution_timeout_secs'] | ||
if command == 'fuzz': | ||
execution_timeout_secs = swarming_config.get('fuzzing_session_duration') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is set to the exact duration of the fuzzing session used by CF, won't this mean the task will always time out? I assume CF will run for just a little longer than this duration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I should rename this to fuzz_task_duration
instead because that's what I meant by this config entry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to fuzz_task_duration
. It is not exactly clear to me how long fuzz tasks run on batch currently.
@jonathanmetzman could you please advise on this?
src/clusterfuzz/_internal/tests/core/bot/tasks/utasks/utasks_test.py
Outdated
Show resolved
Hide resolved
@@ -0,0 +1,1177 @@ | |||
// Copyright 2022 The LUCI Authors. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Can we add a monitoring metric (maybe in a followup PR) to record the rate of successful and unsuccessful task pushes? It would be good to know in real time if we start to fail to schedule tasks due to a change in the swarming API, for example.
tasks.add_utask_main(command, download_url, job_type) | ||
else: | ||
assert swarming.is_swarming_task(command, job_type) | ||
swarming.push_swarming_task(command, download_url, job_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wont this break fuzz_task, which hashn't been migrated yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought fuzz_task was already migrated. we can disable scheduling fuzz_task on gpu machines until it's migrated, right?
@@ -55,6 +59,14 @@ def _timestamp_now() -> Timestamp: | |||
return ts | |||
|
|||
|
|||
def _get_execution_mode(utask_module, job_type): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does anything use QUEUE? Why does it exist then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tasks that run locally use QUEUE. Local tasks call tworker_preprocess_no_io
,uworker_main_no_io
, and tworker_postprocess_no_io
.
For batch/swarming, they use the same path, so we need a way to determine which is used.
execution_timeout_secs=execution_timeout_secs, | ||
env=[ | ||
swarming_pb2.StringPair(key='UWORKER', value='True'), | ||
swarming_pb2.StringPair(key='SWARMING_BOT', value='True'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What code will be starting the task on the batch side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean on the swarming side right?
This is to be figured out once we receive the machines and start setting things up. It will be similar to the existing startup scripts for linux/mac
e7e91a0
to
22e1108
Compare
No description provided.