Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drmemtrace scheduler should synthesize headers for cores that start idle? #6703

Open
derekbruening opened this issue Mar 11, 2024 · 0 comments

Comments

@derekbruening
Copy link
Contributor

The drmemtrace framework stores key info in headers at the start of each trace file. This is meant to be accessible to parallel worker threads in analysis tools and simulators. However, when there are more output streams than inputs, some outputs start idle and so have no headers at all. (These outputs might later host inputs, so they will not necessarily always be idle.) This causes problems for analyzers that need to know the version or filetype or cache line size or chunk size or whatnot in every shard. One solution could be for the scheduler to always read ahead to the first timestamp, store global values of the common header records, and synthesize headers in outputs that start idle.

Even this is not enough for record_filter to operate core-sharded in #6635 as it also needs the input filename extension so it knows how to compress the outputs, but that particular detail seems reasonable to leave as a burden for record_filter.

derekbruening added a commit that referenced this issue Mar 12, 2024
Multiple changes to allow the record filter to operate in core-sharded
fashion:

Makes the pc2encoding table per-input, as one input can migrate across
multiple core shards and thus one core can see a later instruction
without ever having seen its encoding.  To handle synchronization,
there is no C++11 std:: rwlock, so we use mutexes -- but we limit
their use to per-context-switch for the added global lock, and we
assume there is no contention for the per-input lock as only one shard
operates on one input at any one time.

Sets the memref counter reader to core_sharded_ to avoid asserts.

Appends footer records to ending-in-idle-record cores.

Adds an error check ensuring a single workload, as multiple will
require expanding the keys used in some tables.

Renames the output files to include "core.<shard_index>" and not the
tid.  This is surprisingly complex, as an input filename is needed to
determine the output filename compression type: yet not all shards are
guaranteed to have an input at the start.  A condition variable and
mutex are used to coordinate this among shards.

Adds support for started-idle cores by synthesizing headers in
record_filter; #6703 covers having the scheduler do this for all
analyzers.  Adds the version as another field available up front from
the scheduler, and adds an idle-tid sentinel needed to be distinct
from INVALID_THREAD_ID.

Adds two end-to-end tests, one with a single-threaded app scheduled
onto 4 cores to test start-idle cores and one to test multiple
threads.  Adds a macro to share code with the existing end-to-end test.

Updates the unit test mock classes.

Issue: #6635, #6703
derekbruening added a commit that referenced this issue Mar 13, 2024
)

Multiple changes to allow the record filter to operate in core-sharded
fashion:

Makes the pc2encoding table per-input, as one input can migrate across
multiple core shards and thus one core can see a later instruction
without ever having seen its encoding. To handle synchronization, there
is no C++11 std:: rwlock, so we use mutexes -- but we limit their use to
per-context-switch for the added global lock, and we assume there is no
contention for the per-input lock as only one shard operates on one
input at any one time.

Sets the memref counter reader to core_sharded_ to avoid asserts.

Appends footer records to ending-in-idle-record cores.

Adds an error check ensuring a single workload, as multiple will require
expanding the keys used in some tables.

Renames the output files to include "core.<shard_index>" and not the
tid. This is surprisingly complex, as an input filename is needed to
determine the output filename compression type: yet not all shards are
guaranteed to have an input at the start. A condition variable and mutex
are used to coordinate this among shards.

Adds support for started-idle cores by synthesizing headers in
record_filter; #6703 covers having the scheduler do this for all
analyzers. Adds the version as another field available up front from the
scheduler, and adds an idle-tid sentinel needed to be distinct from
INVALID_THREAD_ID.

Adds two end-to-end tests, one with a single-threaded app scheduled onto
4 cores to test start-idle cores and one to test multiple threads. Adds
a macro to share code with the existing end-to-end test.

Updates the unit test mock classes.

Issue: #6635, #6703
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant