Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[analyze][rearchitecture] Reduce delays on untrusted parts of analyze #3700

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

jonathanmetzman
Copy link
Collaborator

@jonathanmetzman jonathanmetzman commented Jan 30, 2024

Instead of preprocessing and then putting on the queue for scheduling,
schedule right away. This eliminates up to 1 minute of time spent
pulling tasks from the queue and can improve execution speed on batch
by:

  1. Allowing us to set a higher priority for the job.
  2. Causing batch to schedule the job earlier because the job
    requires less CPUs than jobs with many tasks. These CPUs may not
    be available in times of high-usage.

… task

Instead of preprocessing and then putting on the queue for scheduling,
schedule right away. This eliminates up to 1 minute of time spent
pulling tasks from the queue and can improve execution speed on batch
by:
1. Causing batch to schedule the job earlier because the job
requires less CPUs than jobs with many tasks. These CPUs may not
be available in times of high-usage.
2. Allowing us to set a higher priority for the job.
Copy link
Collaborator

@letitz letitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Classic question: how hard is it to test this?

LGTM otherwise, with some nits below.

"""Represents an untrusted task. Executes preprocess on this machine, then
queues up the task for scheduling on batch by putting it in the utask_main
queue. Then executes utask_main on an untrusted machine, and postprocess on
another trusted machine if opted-in. Otherwise executes locally."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What executes locally vs remotely has gotten a bit confused with the comment change. It would be good to clarify that either:

  • utask_main is scheduled on cloud batch, then postprocess executes on another trusted bot
  • or everything executes locally immediately

logs.log('Requesting remote execution for utask.')
self.request_remote_execution(command, download_url, job_type)

def request_remote_execution(self, command, download_url, job_type):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function seems redundant?

Edit: ah no, I see now.

tasks.add_utask_main(command, download_url, job_type)

def preprocess(self, task_argument, job_type, uworker_env):
logs.log('Preprocessing utask.')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely useful to include task_argument in the log, at least.

@@ -136,6 +140,20 @@ def preprocess(self, task_argument, job_type, uworker_env):
return download_url


class LowLatencyUTask(UTask):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than inheritance, I suggest adding a boolean member to UTask that controls low-latency behavior, and behaving differently based on its value in UTask.request_temote_execution().

Edit: I see, COMMAND_TYPES requires a mapping from task name to executor class, which kind of forces us to use inheritance over composition. Then consider:

class AbstractUTask(abc.ABC):
  ... stuff ...

  @abstractmethod
  def request_remote_execution(...):
    ...

class UTask(AbstractUTask):
  def request_remote_execution(...):
    ... schedule on the queue ...

class LowLatencyUTask(AbstractUTask):
  def request_remote_execution(...):
    ... schedule on batch ...

At least, it makes the extension point obvious.

@@ -57,6 +57,8 @@
'machine_type',
])

HIGH_PRIORITY = 50
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use a comment explaining the choice of the value.

src/clusterfuzz/_internal/google_cloud_utils/batch.py Outdated Show resolved Hide resolved
Co-authored-by: Titouan Rigoudy <titouan@chromium.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants