Make experiment evaluation a separate process #221

prabhuteja12 · 2023-05-08T11:41:03Z

The current experimentation code in benchmarking runs evaluation in the same thread as the (subsequent) trainings. This is a problem when using DDP as the first evaluation (Line 295) creates several processes (as many GPUs) and each of them try to spawn training processes causing a problem with DDP ports clashing.

Describe the solution you'd like
The evaluation/testing should run in a separate process a la run_training_job and this issue wouldn't occur.

**Additional info:
See the discussion on Lightning forum: Lightning-AI/pytorch-lightning#2537

The text was updated successfully, but these errors were encountered:

prabhuteja12 added the triage label May 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make experiment evaluation a separate process #221

Make experiment evaluation a separate process #221

prabhuteja12 commented May 8, 2023 •

edited

Make experiment evaluation a separate process #221

Make experiment evaluation a separate process #221

Comments

prabhuteja12 commented May 8, 2023 • edited

prabhuteja12 commented May 8, 2023 •

edited