Skip to content

quepas/mPAPI

Repository files navigation

Deprecated: Please use multi-language pep-talk

mPAPI

Simple MATLAB/Octave API for PAPI (Performance Application Programming Interface).

Properties

  • Hardware counters are measured for the parent and child threads (e.g. when using parallelized functions like sum). Unfortunately, there is no way to differentiate which counters come from which thread.
  • Each function in the MEX-file is locked (once loaded it can't be erased using clear function in MATLAB/Octave environment)

Installation

  1. Install PAPI >=5.5.1
  2. Build mPAPI functions: mPAPI_register, mPAPI_tic, mPAPI_toc, mPAPI_groupEvents, mPAPI_enumNativeEvents, mPAPI_enumPresetEvents with MEX-compatible compiler (the repository contains two bash script for building build.sh and build_all.sh):
mex -I/usr/local/include mPAPI_register.c -L/usr/local/lib/ -lpapi -output mPAPI_register
mex -I/usr/local/include mPAPI_tic.c -L/usr/local/lib/ -lpapi -output mPAPI_tic
mex -I/usr/local/include mPAPI_toc.c -L/usr/local/lib/ -lpapi -output mPAPI_toc
mex -I/usr/local/include mPAPI_groupEvents.c -L/usr/local/lib/ -lpapi -output mPAPI_groupEvents
mex -I/usr/local/include mPAPI_enumNativeEvents.c -L/usr/local/lib/ -lpapi -output mPAPI_enumNativeEvents
mex -I/usr/local/include mPAPI_enumPresetEvents.c -L/usr/local/lib/ -lpapi -output mPAPI_enumPresetEvents

Where directory /usr/local/include contains papi.h header and directory /usr/local/lib/ contains libpapi.so static library.

Usage

Counting

  1. Register hardware performance monitoring counters (PMC) using preset or native events:
  • For the current thread/process:
    ev = mPAPI_register({'FP_ARITH:SCALAR_SINGLE', 'L1D:REPLACEMENT', 'PAPI_L2_ICA'})
  • In multiplex mode for the current thread:
    ev = mPAPI_register({'FP_ARITH:SCALAR_SINGLE', 'L1D:REPLACEMENT', 'PAPI_L2_ICA'}, true)
  • For a specific thread/process by PID:
    ev = mPAPI_register({'PAPI_TOT_INS'}, 1234)
  1. Start counters for the specific event-set(s):
mPAPI_tic(ev)
  1. Read counters measurements
  • For the specific event-set:
    >> mPAPI_toc(ev)
    ans = [0, 1559, 4032]
  • For many event-sets:
    >> mPAPI_toc([ev1, ev2])
    ans = [0, 1559, 4032;
           0, 1450, 3999]
  1. Enumarate all available native or preset PAPI events:
>> mPAPI_enumNativeEvents()
ans = {'ix86arch::UNHALTED_CORE_CYCLES', 'ix86arch::INSTRUCTION_RETIRED', ...}
>> mPAPI_enumPresetEvents()
ans = {'PAPI_L1_DCM', 'PAPI_L1_ICM', ...}
  1. Divide events into compatible groups (that can be measured simultaneously)
>> mPAPI_groupEvents({'PAPI_L1_DCM', 'PAPI_L1_ICM', ...})
ans = {{'PAPI_L1_DCM', 'PAPI_L1_ICM', ...},
       ...
      }

Performance traces

  1. Register sampling event and frequency (using overflow threshold):
ev = mPAPI_trace_register('PAPI_TOT_INS', 1000000, {'PAPI_BR_INS', 'PAPI_L1_DCM'}, 'kernel.trace')

The first argument is a performance event used as time, here we sample the program performance after some number of cycles defined by the second argument — sampling interval in the domain of time. The third argument is a cell array of performance events to measure. The last argument is a location of the trace result.

  1. Start the sub-trace, basically, a performance trace for a given test.
mPAPI_trace_tic(ev, 'R2015b:1:1:sdaxpy:loop:341:1'))

The second argument is a header. For conversion of the trace to CSV with trace2csv script you need to use header convention: env:threads:process:benchmark:version:N:in_process. The fields represents: env — execution environment e.g. R2015b; threads — number of threads, process — number of test execution on different environment instances, benchmark and version — the kernel and the version used (marked in the code with %! pragma version), N — input data size, in_process — test repeition in the same instance of the execution environment.

  1. Perform the test.
  2. Finish the sub-trace
mPAPI_trace_toc(ev)

Problems

In order to set an older version of GCC (newer might not be supported by MATLAB's MEX compiler), run mex as follows:

mex GXX='/usr/bin/gcc-X.X' ... % R2013a/R2015b/R2018b

Comments

  • The number of hardware counters available on the system defines the upper limit of counters you can register using mPAPI_register function.
  • Not all hardware counters can be mixed and used simultaneously (except when in multiplex mode).