Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the possibility to push tasks to swarming #3996

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

alhijazi
Copy link
Collaborator

No description provided.

@@ -0,0 +1,1177 @@
// Copyright 2022 The LUCI Authors. All rights reserved.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copied over from luci code, don't know what the procedure is here wrt copyright and license?
@jonathanmetzman

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will we keep this file up to date? Can we have a third-party dependency on the luci repo instead, somehow?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably needs to go in protos/third_party/.
Also, how will this be kept up to date?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked to swarming folks, they don't expect this will change, and even if it changes, changes will be backward compatible, meaning we wouldn't need to update this file (at least not often).
I trimmed down the file to only the parts necessary for a NewTaskRequest message.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Can we add a monitoring metric (maybe in a followup PR) to record the rate of successful and unsuccessful task pushes? It would be good to know in real time if we start to fail to schedule tasks due to a change in the swarming API, for example.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can add that in a follow up pr!

Copy link
Collaborator

@letitz letitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good! Mostly code-level comments, with one bigger question around how to handle the third-party dependency on luci protos.

src/clusterfuzz/_internal/swarming_utils/swarming.py Outdated Show resolved Hide resolved
@@ -0,0 +1,1177 @@
// Copyright 2022 The LUCI Authors. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will we keep this file up to date? Can we have a third-party dependency on the luci repo instead, somehow?

src/clusterfuzz/_internal/bot/tasks/task_types.py Outdated Show resolved Hide resolved
src/clusterfuzz/_internal/bot/tasks/utasks/__init__.py Outdated Show resolved Hide resolved
src/clusterfuzz/_internal/swarming_utils/swarming.py Outdated Show resolved Hide resolved
Comment on lines 43 to 51
def _get_job(job_name: str):
"""Returns the Job entity named by |job_name|. This function was made to make
mocking easier."""
return data_types.Job.query(data_types.Job.name == job_name).get()


def _get_swarming_config():
"""Returns the swarming config. This function was made to make mocking easier."""
return local_config.SwarmingConfig()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mocking where?

For this, I recommend injecting mocks into a class instead of mocking free functions in a module. Consider the following design:

QueryJobFunc = Callable[[str], data_types.Job]

def _real_query_job(str) -> data_types.Job:
  # query ndb

class SwarmingClient:
  def __init__(
      self,
      config: Optional[local_config.SwarmingConfig] = None,
      query_job: Optional[QueryJobFunc] = None):
    self._config = config or local_config.SwarmingConfig()
    self._query_job = query_job or _real_query_job    

  def push_task(command, download_url, job_type):
    # ...

Then tests can inject mocks into the swarming client, and inject the swarming client wherever it is needed.

src/clusterfuzz/_internal/swarming_utils/swarming.py Outdated Show resolved Hide resolved
src/clusterfuzz/_internal/swarming_utils/swarming.py Outdated Show resolved Hide resolved
src/clusterfuzz/_internal/swarming_utils/swarming.py Outdated Show resolved Hide resolved
@jonathanmetzman
Copy link
Collaborator

Are we going to be used the docker image in swarming?

@alhijazi
Copy link
Collaborator Author

Are we going to be used the docker image in swarming?

Yes, I probably should allow specifying the image from the config similar to batch. I will do the necessary change

@alhijazi alhijazi requested a review from letitz May 21, 2024 21:09
@alhijazi alhijazi marked this pull request as ready for review May 23, 2024 09:38
src/clusterfuzz/_internal/base/utils.py Outdated Show resolved Hide resolved
# forcibly terminated and the task results in TIMED_OUT.
execution_timeout_secs = instance_spec['execution_timeout_secs']
if command == 'fuzz':
execution_timeout_secs = swarming_config.get('fuzzing_session_duration')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is set to the exact duration of the fuzzing session used by CF, won't this mean the task will always time out? I assume CF will run for just a little longer than this duration.

Copy link
Collaborator Author

@alhijazi alhijazi May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I should rename this to fuzz_task_duration instead because that's what I meant by this config entry

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to fuzz_task_duration. It is not exactly clear to me how long fuzz tasks run on batch currently.
@jonathanmetzman could you please advise on this?

src/clusterfuzz/_internal/swarming/__init__.py Outdated Show resolved Hide resolved
src/clusterfuzz/_internal/swarming/__init__.py Outdated Show resolved Hide resolved
@@ -0,0 +1,1177 @@
// Copyright 2022 The LUCI Authors. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Can we add a monitoring metric (maybe in a followup PR) to record the rate of successful and unsuccessful task pushes? It would be good to know in real time if we start to fail to schedule tasks due to a change in the swarming API, for example.

src/clusterfuzz/_internal/cron/project_setup.py Outdated Show resolved Hide resolved
@alhijazi alhijazi requested a review from letitz May 27, 2024 21:00
src/clusterfuzz/_internal/protos/swarming_pb2_grpc.py Outdated Show resolved Hide resolved
src/clusterfuzz/_internal/base/utils.py Outdated Show resolved Hide resolved
src/clusterfuzz/_internal/bot/tasks/utasks/__init__.py Outdated Show resolved Hide resolved
tasks.add_utask_main(command, download_url, job_type)
else:
assert swarming.is_swarming_task(command, job_type)
swarming.push_swarming_task(command, download_url, job_type)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wont this break fuzz_task, which hashn't been migrated yet?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought fuzz_task was already migrated. we can disable scheduling fuzz_task on gpu machines until it's migrated, right?

@@ -55,6 +59,14 @@ def _timestamp_now() -> Timestamp:
return ts


def _get_execution_mode(utask_module, job_type):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anything use QUEUE? Why does it exist then?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tasks that run locally use QUEUE. Local tasks call tworker_preprocess_no_io,uworker_main_no_io, and tworker_postprocess_no_io.
For batch/swarming, they use the same path, so we need a way to determine which is used.

execution_timeout_secs=execution_timeout_secs,
env=[
swarming_pb2.StringPair(key='UWORKER', value='True'),
swarming_pb2.StringPair(key='SWARMING_BOT', value='True'),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What code will be starting the task on the batch side?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean on the swarming side right?
This is to be figured out once we receive the machines and start setting things up. It will be similar to the existing startup scripts for linux/mac

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants