Skip to content

Commit

Permalink
Merge branch 'release-3.0.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
jrmadsen committed Dec 29, 2019
2 parents ec25487 + 3a5e311 commit b36b167
Show file tree
Hide file tree
Showing 30 changed files with 704 additions and 325 deletions.
2 changes: 1 addition & 1 deletion CMakeLists.txt
Expand Up @@ -81,7 +81,7 @@ foreach(_TYPE DATAROOT CMAKE INCLUDE LIB BIN MAN DOC)
endforeach()

configure_file(${PROJECT_SOURCE_DIR}/source/timemory/version.h.in
${PROJECT_BINARY_DIR}/source/timemory/_version.h @ONLY)
${PROJECT_SOURCE_DIR}/source/timemory/version.h @ONLY)

# execute_process(
# COMMAND ${CMAKE_COMMAND} -E copy_if_different
Expand Down
14 changes: 10 additions & 4 deletions README.md
Expand Up @@ -25,9 +25,12 @@
| PyPi | `pip install timemory` |
| Anaconda Cloud | `conda install -c jrmadsen timemory` |


Timemory is a performance measurement and analysis framework.

## Why Use timemory?

- __*Timemory is arguably the most customizable performance analysis and tuning API available*__
- __*Timemory is arguably the most customizable performance measurement and analysis API available*__
- __*High-performance*__: very low overhead when enabled and borderline negligible runtime disabled
- Ability to arbitrarily switch and combine different measurement types anywhere in application
- Provides static reporting (fixed at compile-time), dynamic reporting (selected at run-time), or hybrid
Expand Down Expand Up @@ -106,11 +109,14 @@ you want to measure and run your code: initialization and output are automated.

## Profiling and timemory

Timemory is not a full profiler and is intended to supplement profilers, not be used in lieu of profiling,
which are important for _discovering where to place timemory markers_.
Timemory is not a full profiler (yet). The ultimate goal is to create a customizable profiler.
Currently, timemory supports explicit instrumentation (i.e. minor modifications to source code)
and explicit wrapping of dynamically-linked functions.
Using profilers are currently important for _discovering where to place timemory markers_ or
_which dynamically function calls to wrap with GOTCHA_.
The library provides an easy-to-use method for always-on general HPC analysis metrics
(i.e. timing, memory usage, etc.) with the same or less overhead than if these metrics were to
records and stored in a custom solution (there is zero polymorphism) and, for C++ code, extensively
records and stored in a custom solution and, for C++ code, extensively
inlined.
Functionally, the overhead is non-existant: sampling profilers (e.g. gperftools, VTune)
at standard sampling rates barely notice the presence of timemory unless it is been
Expand Down
2 changes: 0 additions & 2 deletions cmake/Modules/Options.cmake
Expand Up @@ -199,8 +199,6 @@ if(${PROJECT_NAME}_MASTER_PROJECT)
endif()

# timemory options
add_option(TIMEMORY_USE_EXCEPTIONS
"Signal handler throws exceptions (default: exit)" OFF ${_FEATURE})
add_option(TIMEMORY_USE_EXTERN_INIT
"Do initialization in library instead of headers" OFF)
add_option(TIMEMORY_USE_MPI
Expand Down
2 changes: 1 addition & 1 deletion docker/Dockerfile
Expand Up @@ -18,7 +18,7 @@ WORKDIR /tmp

# build and env args used by package-manager
ARG COMPILER_TYPE=gcc
ARG GCC_VERSION=9
ARG GCC_VERSION=8
ARG CLANG_VERSION=9
ARG ENABLE_DISPLAY=1

Expand Down
8 changes: 5 additions & 3 deletions docker/config/apt.sh
Expand Up @@ -60,8 +60,8 @@ run-verbose apt-get -y install clang-${CLANG_VERSION} libc++-dev libc++abi-dev

DISPLAY_PACKAGES="xserver-xorg freeglut3-dev libx11-dev libx11-xcb-dev libxpm-dev libxft-dev libxmu-dev libxv-dev libxrandr-dev \
libglew-dev libftgl-dev libxkbcommon-x11-dev libxrender-dev libxxf86vm-dev libxinerama-dev qt5-default \
emacs-nox vim-nox"
CUDA_VER=$(dpkg --get-selections | grep cuda-cudart- | awk '{print $1}' | head -n 1 | sed 's/cuda-cudart-//g')
emacs-nox vim-nox firefox"
CUDA_VER=$(dpkg --get-selections | grep cuda-cudart- | awk '{print $1}' | tail -n 1 | sed 's/cuda-cudart-//g' | sed 's/dev-//g')

#-----------------------------------------------------------------------------#
#
Expand Down Expand Up @@ -165,6 +165,8 @@ bash miniconda.sh -b -p /opt/conda
export PATH="/opt/conda/bin:${PATH}"
conda config --set always_yes yes --set changeps1 yes
conda update -c defaults -n base conda
conda install -n base -c defaults -c conda-forge python=3.6 pyctest cmake scikit-build numpy matplotlib pillow
conda install -n base -c defaults -c conda-forge python=3.6 pyctest cmake scikit-build numpy matplotlib pillow ipykernel jupyter
source activate
python -m ipykernel install --name base --display-name base
conda clean -a -y
conda config --set always_yes no
49 changes: 41 additions & 8 deletions docker/config/timemory-install.sh
Expand Up @@ -20,6 +20,10 @@ export LD_LIBRARY_PATH=/usr/local/lib:${LD_LIBRARY_PATH}
ROOT_DIR=${PWD}
: ${TIMEMORY_BRANCH:="master"}

#--------------------------------------------------------------------------------------------#
# LIKWID
#--------------------------------------------------------------------------------------------#

run-verbose cd ${ROOT_DIR}
run-verbose git clone https://github.com/RRZE-HPC/likwid.git
run-verbose cd likwid
Expand All @@ -31,19 +35,37 @@ ssed -i 's/@install/install/g' Makefile
ssed -i 's/@cd/cd/g' Makefile
run-verbose make install -j6

#--------------------------------------------------------------------------------------------#
# TAU
#--------------------------------------------------------------------------------------------#

run-verbose cd ${ROOT_DIR}
run-verbose wget http://tau.uoregon.edu/tau.tgz
run-verbose tar -xzf tau.tgz
run-verbose cd tau-*
export CFLAGS="-O3"
export CPPFLAGS="-O3"
export CFLAGS="-O3 -fPIC"
export CPPFLAGS="-O3 -fPIC"
# run-verbose ./configure -python -prefix=/usr/local -pthread -papi=/usr -mpi -mpiinc=/usr/include/mpich -cuda=/usr/local/cuda
run-verbose ./configure -python -prefix=/usr/local -pthread -papi=/usr -mpi -mpiinc=/usr/include/mpich
run-verbose make -j6
run-verbose make install -j6
unset CFLAGS
unset CPPFLAGS

#--------------------------------------------------------------------------------------------#
# UPC++
#--------------------------------------------------------------------------------------------#

run-verbose git clone https://jrmadsen@bitbucket.org/berkeleylab/upcxx.git
run-verbose cd upcxx
export CFLAGS="-fPIC"
export CPPFLAGS="-fPIC"
run-verbose ./install /usr/local

#--------------------------------------------------------------------------------------------#
# timemory
#--------------------------------------------------------------------------------------------#

run-verbose cd ${ROOT_DIR}
run-verbose git clone -b ${TIMEMORY_BRANCH} https://github.com/NERSC/timemory.git timemory-source
run-verbose cd timemory-source
Expand All @@ -56,12 +78,23 @@ run-verbose cmake -DCMAKE_INSTALL_PREFIX=/usr/local -DCMAKE_BUILD_TYPE=Release -
run-verbose ninja -j6
run-verbose ninja install

run-verbose git clone https://jrmadsen@bitbucket.org/berkeleylab/upcxx.git
run-verbose cd upcxx
export CFLAGS="-fPIC"
export CPPFLAGS="-fPIC"
run-verbose ./install /usr/local
#--------------------------------------------------------------------------------------------#
# tomopy
#--------------------------------------------------------------------------------------------#

run-verbose cd ${ROOT_DIR}
run-verbose git clone https://github.com/jrmadsen/tomopy.git tomopy
run-verbose cd tomopy
run-verbose git checkout accelerated-redesign
run-verbose conda env create -n tomopy -f envs/linux-36.yml
source activate
run-verbose conda activate tomopy
run-verbose python -m pip install -vvv .
run-verbose conda clean -a -y

cd ${ROOT_DIR}
#--------------------------------------------------------------------------------------------#
# Cleanup
#--------------------------------------------------------------------------------------------#

run-verbose cd ${ROOT_DIR}
run-verbose rm -rf ${ROOT_DIR}/*
21 changes: 12 additions & 9 deletions docker/docker-compose.yml
Expand Up @@ -7,9 +7,10 @@ version: "3.3"

services:
#--------------------------------------------------------------------------#
# TiMemory development container
timemory-dev:
image: jrmadsen/timemory:dev
# timemory development container w/ CUDA 10.0
#
timemory-dev-10-0:
image: jrmadsen/timemory:cuda-10.0
stdin_open: true
tty: true
build:
Expand All @@ -25,9 +26,10 @@ services:
ENABLE_DISPLAY: "1"

#--------------------------------------------------------------------------#
# TiMemory development container
timemory-dev-edge:
image: jrmadsen/timemory:dev-edge
# timemory development container w/ CUDA 10.1
#
timemory-dev-10-1:
image: jrmadsen/timemory:cuda-10.1
stdin_open: true
tty: true
build:
Expand All @@ -43,7 +45,8 @@ services:
ENABLE_DISPLAY: "1"

#--------------------------------------------------------------------------#
# TiMemory development container
# timemory development container w/ CUDA 10.2
#
timemory-latest:
image: jrmadsen/timemory:latest
stdin_open: true
Expand All @@ -53,9 +56,9 @@ services:
dockerfile: Dockerfile
args:
BASE_IMG: "nvidia/cuda"
BASE_TAG: "latest"
BASE_TAG: "10.2-devel-ubuntu18.04"
COMPILER_TYPE: "gcc"
GCC_VERSION: "9"
GCC_VERSION: "8"
CLANG_VERSION: "9"
REQUIRE_CUDA_VERSION: "10.1"
ENABLE_DISPLAY: "1"
26 changes: 16 additions & 10 deletions docs/about.md
@@ -1,18 +1,24 @@
# About

Timemory is very _lightweight_, _cross-language_ timing, resource usage, and hardware counter utility
for reporting timing, resource usage, and hardware counters for the CPU and GPU.

Timemory is implemented as a generic C++11 template library but supports implementation in C, C++, CUDA, and Python codes.
The design goal of timemory is to enable "always-on" performance analysis that can be standard part of the source code
with a negligible amount of overhead.

Timemory is not intended to replace profiling tools such as Intel's VTune, GProf, etc. -- instead,
it complements them by enabling one to verify timing and memory usage without the overhead of the profiler.
Timemory is a modular API for performance measurements and analysis with a very lightweight overhead.
If timemory does not support a particular measurement type or analysis method, user applications
can easily create their own component that accomplishes the desired task.

Timemory is implemented as a generic C++11 template library but supports implementation
in C, C++, CUDA, and Python codes.
The design goal of timemory is to create an easy-to-use framework for generating
performance measurements and analysis methods which are extremely flexible
with respect to both how the data is stored/accumulated and which methods the measurement
or analysis supports. In order to keep the overhead as low as reasonable achievable,
a significant amount of logic is evaluated at compile-time. As a result, applications
which directly utilize the C++ template interface tend to see increases in compilation
time, binary size (especially when debug info is included), and compiler memory usage.
If this aspect of timemory impedes productivity, the best course of action is to
utilize the library interface.

## Credits

Timemory is actively maintained by NERSC at Lawrence Berkeley National Laboratory
Timemory is actively developed by NERSC at Lawrence Berkeley National Laboratory

| Name | Affiliation | GitHub |
| ------------------ | :---------------------------------------------------------------------------------------: | :-------------------------------------------: |
Expand Down
62 changes: 61 additions & 1 deletion docs/components/gotcha.md
Expand Up @@ -18,6 +18,14 @@ where `Size` is the maximum number of external functions to be wrapped,
`Diff` is an optional template parameter for differentiating `gotcha` components with equivalent `Size` and `Tools`
parameters but wrap different functions. Note: the `Tools` type cannot contain other `gotcha` components.

### Use Cases

The `gotcha` component in timemory can provide either of the following functionalities:

1. Scoped instrumentation around external dynamically-linked function calls
2. Wholesale replacement of external dynamically-linked function calls


## Traditional GOTCHA in C

Writing a traditional GOTCHA wrapper in C requires a bit of work and the recommended methods require
Expand Down Expand Up @@ -77,7 +85,59 @@ A GOTCHA wrapper with timemory can be defined in a single line of code and there
macros provided that eliminate the need for specifying the function signature (return-type and
arguments) due to the ability for templates to extract these parameters.
## GOTCHA Example
## Function Replacement with GOTCHA Example
Suppose that an application is spending a signifincant amount of run-time calling the standard math library
double-precision `exp` function and you would like to investigate whether using single-precision `expf` is an
acceptable substitute in certain regions. Instead of writing the [full specification](#traditional-gotcha-in-c)
shown previously and manually enabling and disabling the wrapper in the region of interest, you can use timemory.
Provided below is the full component specification require to implement the replacement function.
```cpp
// NOTE: declared in tim::component::
struct exp_intercept : public base<exp_intercept, void>
{
double operator()(double val)
{ return expf(static_cast<float>(val)); }
};
```

When the `exp_intercept` component is _appropriately_ configured within a `gotcha` component,
whenever `double exp(double)` is invoked, timemory will (via the GOTCHA library) redirect this function call to
`double exp_intercept::operator()(double)` -- and within this function, the replaced call to `expf` is implemented.
Configuring the `gotcha` component is slightly different, however. The goal of this component is __*optimization*__
instead of __*measurement or analysis*__ so the `gotcha` component is specified as such:

```cpp
using empty_t = component_tuple<>;
using exp_gotcha_t = gotcha<1, empty_t, exp_intercept>;
```

In other words, we define a `gotcha` component with an empty set of measurement/analysis components and
then we specify _a component_ as the third template parameter. The _combination_ of an empty measurement/analysis
collection as the second template parameter and a component as the third template parameter trigger a special
optimized wrapper around the original function call which is explicitly designed to minimize the overhead of
the redirection to the wrapper.

All that remains is implementing the initializer that specifies which functions are wrapped by the `gotcha` component:

```cpp
__attribute__((constructor))
void init_gotcha()
{
exp_gotcha_t::get_initializer() = [=]()
{ TIMEMORY_C_GOTCHA(exp_gotcha_t, 0, exp); };
}
```
In the above, using the constructor attribute (only available with certain compilers) creates a function
that is automatically executed before main starts. Since this function configured the gotcha within a call-back,
instead of explicitly invoking `TIMEMORY_C_GOTCHA`, the gotcha wrapper is not activated during this function,
meaning that the redirection of `exp` to `expf` is explicitly tied to the allocation of
at least one instance of `exp_gotcha_t`.
## Instrumentation with GOTCHA Example
> Reference: [source/tests/gotcha_tests.cpp](https://github.com/NERSC/timemory/blob/master/source/tests/gotcha_tests.cpp)
Expand Down

0 comments on commit b36b167

Please sign in to comment.