Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stdlib: Add multiprocessing simulation class #942

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

Harshil2107
Copy link
Contributor

@Harshil2107 Harshil2107 commented Mar 18, 2024

This PR is my first stab at adding a mutli processing module to stdlib.

  • There are some extra files that are used for testing but will be removed when I undraft the PR

Change-Id: I98b03c52eb769731a04d3d2aab8a79eb8ac91d15

Change-Id: I98b03c52eb769731a04d3d2aab8a79eb8ac91d15
@ivanaamit ivanaamit added the stdlib The gem5 standard library. Code typically found under "src/pythongem5" label Mar 18, 2024
Change-Id: Ia05bd6422b52f9d319448ccefe1639613c33120f
@Harshil2107 Harshil2107 marked this pull request as ready for review March 19, 2024 17:31
@Harshil2107
Copy link
Contributor Author

I am not to sure about what name the class should be. I have tentatively gone with MultiSim. I will also push a test that runs the example script.

Change-Id: Ie223805cb560d34f498092f359fabb300b5bb30d
Change-Id: Ic08824fc2eca66dc3d8f30e90b740c96a2dad040
Change-Id: I5c902c924025b82c37138437df0cf43f202ee0d8
Copy link
Member

@BobbyRBruce BobbyRBruce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the big gap in your knowledge if you haven't properly dived into the Multiprocessing module beyond the Process class. I'd suggest reaching through a tutorial on some of the more "advanced" features. multiprocessing.pool is very powerful and solves a lot of the issues you tried to fix here (see my in-line comments).

tests/gem5/multiprocessing_tests/README.md Show resolved Hide resolved
tests/gem5/multiprocessing_tests/README.md Show resolved Hide resolved
tests/gem5/multiprocessing_tests/README.md Show resolved Hide resolved
src/python/gem5/simulate/multi_sim.py Outdated Show resolved Hide resolved
src/python/gem5/simulate/multi_sim.py Outdated Show resolved Hide resolved
src/python/gem5/simulate/multi_sim.py Outdated Show resolved Hide resolved
src/python/gem5/simulate/multi_sim.py Outdated Show resolved Hide resolved
Comment on lines 63 to 92
processes = []

for sim_callable in self.sim_callables:
if len(processes) > num_cpus:
for process in processes:
if not process.is_alive():
processes.remove(process)
os.kill(process.pid, signal.SIGTERM)
sleep(1)
process_name = (
f"{sim_callable.func.__name__}_{sim_callable.args[0].get_id()}"
)
process = Process(
target=run_simulator,
args=(sim_callable,),
name=process_name,
)
print(f"Starting process {process_name}")

process.start()
processes.append(process)

while processes:
for process in processes:
if not process.is_alive():
processes.remove(process)
sleep(1)

print("All simulations have finished")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You really should've read up on multiprocess pools before this. This functionality you provide already exists to handle the scheduling of new processes. The following should be roughly what you need to do:

# Specify the number of processes the pool is allowed to use.
# Note: If `multiprocessing.Pool()` then all available processes are used.
pool = multiprocessing.Pool(processes=num_proc)

# Create a process for each element in `self._sim_callables` passed to function `_run_simulator`. I.e.:
# Process 1 : _run_simulator(self.sim_callables[0])
# Process 2 : _run_simulator(self.sim_callables[1])
# ... and so on.
pool.map(_run_simulator, self.sim_callables) 

# The `pool.map` function will only return when all the processes have finished executing.

def _run_simulator(sim_callable: Callable[[], Simulator]) -> None:
    sim_callable().run()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had a good reason not to use multiprocessing.Pool. It does work, but it's going to be limiting our features.

The reason is that I am looking forward to a new feature that wouldn't be possible with Pool. I want to be able to track the processes ourselves. Specifically, I want to send a special signal periodically and ask the running simulation to tell me what it's current status is. This is related to the better support for exit events.

I can provide more detail later, but I have a plan!

src/python/gem5/simulate/multi_sim.py Outdated Show resolved Hide resolved
Comment on lines +41 to +43
all_simulations_completed_verifier = verifier.MatchRegex(
re.compile(r"All simulations have finished")
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments on this test and the wider change:

  1. I'm happy with "exit code == 0" for this.
  2. I don't think the MultiSim object should be printing this message. I'd assume if the run (or run_all) function returns without complaint then all the simulations have finished.

Comment on lines +36 to +39
if config.bin_path:
resource_path = config.bin_path
else:
resource_path = joinpath(absdirpath(__file__), "..", "resources")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've copy and pasted this from other test scripts but the resource_path is never used anywhere here.

"multisim",
"multi-sim-example.py",
),
config_args=[],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought: Why don't you have the single argument of this config to be the number of threads? It doesn't make much sense for it to be hard-coded in the script.

@BobbyRBruce
Copy link
Member

I created a little toy example for an approach I think is better. Despite being a simple bare-bones proof-of-concept I hope it's clear how it'd all fit together if built upon: BobbyRBruce@599e76f.

You can run the example with ./build/ALL/gem5.opt multisim/config.py.

Copy link
Contributor

@powerjg powerjg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking good. A few comments below.

src/python/gem5/simulate/multi_sim.py Outdated Show resolved Hide resolved
src/python/gem5/simulate/multi_sim.py Outdated Show resolved Hide resolved
args=(sim_callable,),
name=process_name,
)
print(f"Starting process {process_name}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't have any print statements in this code. There's a good argument to need logging but for that, we should use something better. E.g., info() if we can or logging in python. That said, we need to tread lightly here with changes. My only suggestion is to remove the print statement or change it to an info

src/python/gem5/simulate/multi_sim.py Outdated Show resolved Hide resolved
Comment on lines 63 to 92
processes = []

for sim_callable in self.sim_callables:
if len(processes) > num_cpus:
for process in processes:
if not process.is_alive():
processes.remove(process)
os.kill(process.pid, signal.SIGTERM)
sleep(1)
process_name = (
f"{sim_callable.func.__name__}_{sim_callable.args[0].get_id()}"
)
process = Process(
target=run_simulator,
args=(sim_callable,),
name=process_name,
)
print(f"Starting process {process_name}")

process.start()
processes.append(process)

while processes:
for process in processes:
if not process.is_alive():
processes.remove(process)
sleep(1)

print("All simulations have finished")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had a good reason not to use multiprocessing.Pool. It does work, but it's going to be limiting our features.

The reason is that I am looking forward to a new feature that wouldn't be possible with Pool. I want to be able to track the processes ourselves. Specifically, I want to send a special signal periodically and ask the running simulation to tell me what it's current status is. This is related to the better support for exit events.

I can provide more detail later, but I have a plan!

src/python/gem5/simulate/multi_sim.py Outdated Show resolved Hide resolved
configs/example/gem5_library/multisim/example_callable.py Outdated Show resolved Hide resolved
Comment on lines 28 to 59
from gem5.isas import ISA
from gem5.prebuilt.riscvmatched.riscvmatched_board import RISCVMatchedBoard
from gem5.simulate.simulator import Simulator
from gem5.utils.requires import requires


def run_riscvmathed_workload(resource) -> Simulator:
requires(isa_required=ISA.RISCV)

# instantiate the riscv matched board with default parameters
board = RISCVMatchedBoard()

board.set_workload(resource)

# run the simulation with the RISCV Matched board
simulator = Simulator(board=board, full_system=False)
return simulator


def run_riscvmatched_worklaod_diff_clocks(resource) -> Simulator:
requires(isa_required=ISA.RISCV)

# instantiate the riscv matched board with default parameters
board = RISCVMatchedBoard(
clk_freq="2.2GHz",
)

board.set_workload(resource)

# run the simulation with the RISCV Matched board
simulator = Simulator(board=board, full_system=False)
return simulator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this works +1.

to_return.append(lambda: Simulator(board)) may work as well if you don't want to use partial. I can't decide which I like more.

Comment on lines +42 to +43
workload_1 = obtain_resource("riscv-gapbs-tc-run")
workload_2 = obtain_resource("riscv-npb-is-size-s-run")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the example suite?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot currently as suites dont work with multiprocessing. After the refactor obtain_resource PR is merged, we can change this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Bobby is right here. Can you test if you can put this in a single file?

I think the only thing that needs to be in another file is what's in multi_sim.py

@powerjg powerjg added this to the v24.0 milestone Apr 11, 2024
Change-Id: I727fd8518b2adfc99e66f208c465a9b587bb6032
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib The gem5 standard library. Code typically found under "src/pythongem5"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants