Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallel running: added Dask support #72

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

ZvikaZ
Copy link
Contributor

@ZvikaZ ZvikaZ commented Jun 25, 2023

This solves #63

Few notes:

  1. It doesn't improve 2 eckity examples I checked, because their fitness calculation is so fast, that Dask's overhead is greater than the parellel run gain.
  2. However, I did see significant speed gain with my code. In order to test this, you might want to time.sleep(1) to make the example's fitness calculation slower.
  3. You might consider adding Dask as dependency. However it might be arguable since maybe not every EcKitY's user also want Dask support. If desired, we can wrap the Dask import in a try block, and behave accordingly.
  4. This PR can be accepted as-is (beyond the dependency note), and after it we might want to research more usages of Dask to further improve the speed.

Usage:

  • Legacy behaviour is kept - old code will behave exactly the same, without Dask
  • If user wishes to use Dask, they need to pass executor='dask' argument to SimpleEvolution
  • Dask has two flavours:
    -- Local - run parallel on local machine CPU's. This mode will be used automatically if nothing further is done. If user wishes to configure exact behaviour, see example below.
    -- Cluster - run parallel using Slurm (or similar LSF facilities) - the Slurm cluster need to be configured with few code lines.
  • max_workers is ignored if Dask is used

Example:

    DaskUsage = Enum('DaskUsage', 'Cluster Local NoDask')
    dask_usage = DaskUsage.Local    # it will be probably controlled via command line arguments
    if dask_usage == DaskUsage.Cluster :
        cluster = SLURMCluster(cores=4,
                               memory='16GiB',
                               # by default the job is sent with 30 minutes time limit - disable it
                               job_directives_skip=['-t 00:30:00'],
                               # default port is usually taken, I choose a random number,
                               # instead of letting DASK choose one for me
                               scheduler_options={'dashboard_address': ':44344'},
                               )
        cluster.scale(jobs=32)    # automatically start Slurm jobs ("workers") 
        # in order to see the SBATCH script, run:
        print(cluster.job_script())
    elif dask_usage == DaskUsage.Local:
        # run 4 parallel workers on local machine
        # if omitted, Dask will just use local machine, with default parameters
        cluster = LocalCluster(n_workers=4, dashboard_address=44344)

    if dask_usage in [DaskUsage.Cluster, DaskUsage.Local]:
        # this tells Dask (and therefore, EcKity) to use our new cluster instead of local computer
        # no need to pass 'client' to EcKity
        client = Client(cluster)
        print(client)
        print(client.dashboard_link)

...

    algo = SimpleEvolution(
        ...
        executor='dask' if dask_usage in [DaskUsage.Cluster, DaskUsage.Local] else 'process',
        max_workers=1,    # ignored in Dask mode
        ...

@ZvikaZ ZvikaZ changed the base branch from main to develop June 27, 2023 10:03
@itaitzruia4
Copy link
Collaborator

itaitzruia4 commented May 19, 2024

I think your comparison was against ThreadPoolExecutor (or simply serial fitness evaluations).
I don't like the added dependency part. Dask should be compared with ProcessPoolExecutor, or joblib (which is already included as a dependency since sklearn uses it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants