drmemtrace scheduler should synthesize headers for cores that start idle? #6703

derekbruening · 2024-03-11T17:52:08Z

The drmemtrace framework stores key info in headers at the start of each trace file. This is meant to be accessible to parallel worker threads in analysis tools and simulators. However, when there are more output streams than inputs, some outputs start idle and so have no headers at all. (These outputs might later host inputs, so they will not necessarily always be idle.) This causes problems for analyzers that need to know the version or filetype or cache line size or chunk size or whatnot in every shard. One solution could be for the scheduler to always read ahead to the first timestamp, store global values of the common header records, and synthesize headers in outputs that start idle.

Even this is not enough for record_filter to operate core-sharded in #6635 as it also needs the input filename extension so it knows how to compress the outputs, but that particular detail seems reasonable to leave as a burden for record_filter.

Multiple changes to allow the record filter to operate in core-sharded fashion: Makes the pc2encoding table per-input, as one input can migrate across multiple core shards and thus one core can see a later instruction without ever having seen its encoding. To handle synchronization, there is no C++11 std:: rwlock, so we use mutexes -- but we limit their use to per-context-switch for the added global lock, and we assume there is no contention for the per-input lock as only one shard operates on one input at any one time. Sets the memref counter reader to core_sharded_ to avoid asserts. Appends footer records to ending-in-idle-record cores. Adds an error check ensuring a single workload, as multiple will require expanding the keys used in some tables. Renames the output files to include "core.<shard_index>" and not the tid. This is surprisingly complex, as an input filename is needed to determine the output filename compression type: yet not all shards are guaranteed to have an input at the start. A condition variable and mutex are used to coordinate this among shards. Adds support for started-idle cores by synthesizing headers in record_filter; #6703 covers having the scheduler do this for all analyzers. Adds the version as another field available up front from the scheduler, and adds an idle-tid sentinel needed to be distinct from INVALID_THREAD_ID. Adds two end-to-end tests, one with a single-threaded app scheduled onto 4 cores to test start-idle cores and one to test multiple threads. Adds a macro to share code with the existing end-to-end test. Updates the unit test mock classes. Issue: #6635, #6703

) Multiple changes to allow the record filter to operate in core-sharded fashion: Makes the pc2encoding table per-input, as one input can migrate across multiple core shards and thus one core can see a later instruction without ever having seen its encoding. To handle synchronization, there is no C++11 std:: rwlock, so we use mutexes -- but we limit their use to per-context-switch for the added global lock, and we assume there is no contention for the per-input lock as only one shard operates on one input at any one time. Sets the memref counter reader to core_sharded_ to avoid asserts. Appends footer records to ending-in-idle-record cores. Adds an error check ensuring a single workload, as multiple will require expanding the keys used in some tables. Renames the output files to include "core.<shard_index>" and not the tid. This is surprisingly complex, as an input filename is needed to determine the output filename compression type: yet not all shards are guaranteed to have an input at the start. A condition variable and mutex are used to coordinate this among shards. Adds support for started-idle cores by synthesizing headers in record_filter; #6703 covers having the scheduler do this for all analyzers. Adds the version as another field available up front from the scheduler, and adds an idle-tid sentinel needed to be distinct from INVALID_THREAD_ID. Adds two end-to-end tests, one with a single-threaded app scheduled onto 4 cores to test start-idle cores and one to test multiple threads. Adds a macro to share code with the existing end-to-end test. Updates the unit test mock classes. Issue: #6635, #6703

derekbruening mentioned this issue Mar 12, 2024

i#6635 core filter, part 6: Add core-sharded record filter output #6704

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

drmemtrace scheduler should synthesize headers for cores that start idle? #6703

drmemtrace scheduler should synthesize headers for cores that start idle? #6703

derekbruening commented Mar 11, 2024

drmemtrace scheduler should synthesize headers for cores that start idle? #6703

drmemtrace scheduler should synthesize headers for cores that start idle? #6703

Comments

derekbruening commented Mar 11, 2024