Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slurmkit scripts slowing down the I/O #111

Open
edyoshikun opened this issue Dec 5, 2023 · 0 comments
Open

Slurmkit scripts slowing down the I/O #111

edyoshikun opened this issue Dec 5, 2023 · 0 comments

Comments

@edyoshikun
Copy link
Contributor

edyoshikun commented Dec 5, 2023

Problem

Running scripts like the stabilization and cropping were causing a slowdown in the I/O for other users @talonchandler @ieivanov

After some debugging, I believe I've nailed down the issue. This issue can be replicated in two ways with datasets that have lots of positions and timepoints and/or channels.

  1. Most slurm scripts parallize over T and C and submit individual jobs for each positions. Keep in mind that since we parallelize over these two dimensions each multiprocessing pool will spawn n number of subprocesses. So, if all positions are allocated, then each jobs will spawn its respective number of child processes causing lots of I/O calls.
  2. Make a slurm script with more CPUs, memory and simultaneous number of subprocesses. If we do this we run into the same issue as we will have multiple jobs running in parallel and I/O calls which reduce our overall throughput.

Proposed solutions:

  • Make sure that the total amount of jobs per datasets is below a TBD threshold. The combination of T and C simultaneous processes and number of positions should be less than this threshold. I need to find a sweet spot for this.
  • Modify the slurmscripts so that we run the positions in batches. This can be done by creating dependencies between the jobs as follows:
batch_size = 3
register_jobs = []

for i in range(0, len(input_position_dirpaths), batch_size):
    chunk_input_paths = input_position_dirpaths[i : i + batch_size]
    if i == 0:
        register_jobs = [
            submit_function(
                register_func,
                slurm_params=params,
                input_data_path=in_path,
                output_path=out_path,
            )
            for in_path in chunk_input_paths
        ]
    else:
        register_jobs = [
            submit_function(
                register_func,
                slurm_params=params,
                input_data_path=in_path,
                output_path=out_path,
                dependencies=register_jobs,
            )
            for in_path in chunk_input_paths

        ]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant