Skip to content

Building and testing

Edgar Solomonik edited this page Sep 20, 2019 · 12 revisions

Configuration Options and Build Process

Cyclops can be built in a few different ways, depending on your intended usage. For the most basic library build, only MPI header files are required as a prerequisite. The following components can be built

  • Cyclops C++ library (static or dynamic)
    • with/without OpenMP
    • with/without HPTT (high performance tensor transpose)
    • with/without MKL sparse BLAS functionality
    • with/without ScaLAPACK functionality
    • with/without CUDA offloading (experimental)
  • Cyclops Python library
  • C++ test suite and examples
  • Python test suite

The first-step to any build is to run the configure script, which sets up a config.mk and setup.py file with appropriate build parameters, flags, and libraries. The script tests some of the configuration parameters and attempts standard linux defaults (such as -llapack and -lblas). The script can also build with external optional dependencies via --build-hptt and --build-scalapack. For a full, up-to-date list of configuration parameters, execute

./configure --help

Usage: configure [options]

	--install-dir=dir Specify where to install header and lib files (default is /usr/local/,
	                  so headers will be installed to /usr/local/include and libs to /usr/local/lib)

	--build-dir=dir   Specify where to build object files, library, and executable (default is .)

	--with-lapack     Tells CTF build to enable LAPACK functionality regardless of whether LAPACK libs have been given.

	--with-scalapack  Tells CTF build to enable ScaLAPACK functionality regardless of whether ScaLAPACK libs have been given.

	--build-scalapack Tells CTF to download and build ScaLAPACK library.

	--with-hptt       Tells CTF build to enable HPTT functionality.

	--build-hptt      Tells CTF to download and build HPTT library.

	--with-cuda       Tells CTF to setup and use NVCC, NVCCFLAGS, and CUBLAS libs

	--no-dynamic      Turns off configuration and build of dynamic (shared) libraries (these are needed for Python codes and some C++ codes)

	--no-static       Turns off configuration and build of static libraries (these are needed for C++ codes)

	--verbose         Does not suppress tests of compile/link commands

	LIB_PATH=-Ldir    Specify paths to static libraries, e.g. -L/usr/local/lib/ to be used for test executables

	LD_LIB_PATH=-Ldir Specify paths to dynamic libraries to be used for test executables as part of LD_LIBRARY_PATH

	LIBS=-llibname    Specify list of static libraries to link to, by default "-lblas -llapack -lscalapack" or a subset
	                  (for each -l<name> ensure lib<name>.a is found in LIBRARY_PATH or LIB_PATH)

	LD_LIBS=-llibname Specify list of dynamic libraries to link to, by default "-lblas -llapack -lscalapack" or a subset
	                  (for each -l<name> ensure lib<name>.a is found in LD_LIBRARY_PATH or LD_LIB_PATH)

	CXX=compiler      Specify the C++ compiler (e.g. mpicxx)

	CXXFLAGS=flags    Specify the compiler flags (e.g. "-g -O0" for a lightweight debug build),
	                  can also be used to specify macros like -DOMP_OFF (turn off OpenMP) -DPROFILE -DPMPI (turn on performance profiling)
	                  -DVERBOSE=1 -DDEBUG, see docs and generated config.mk for details)

	LINKFLAGS=flags   Specify flags for creating the static library

	LDFLAGS=flags     Specify flags for creating the dynamic library

	INCLUDES=-Idir    Specify directories and header files to include (e.g. -I/path/to/mpi/header/file/)

Additionally, the variables AR, NVCC, NVCCFLAGS, and WARNFLAGS
can be set on the command line, e.g. ./configure CXX=g++ CXXFLAGS="-fopenmp -O2 -g".

Each time the configure script is executed successfully a line is appended to the file how-did-i-configure in the build directory.

Basic steps to building C++ and Python libraries

Once the configure script has executed successfully (can review/change flags in config.mk and setup.py or reconfigure until these are satisfactory), the C++ and Python Cyclops library can be build via GNU make,

make #build static and dynamic libraries
make -j4 #... using four threads
make ctflib #build static library
make ctflibso #build dynamic library
make test #build test_suite using ctflib and execute
make python #use Cython to build the Cyclops python library
make python_test #run suite of Python tests using (locally built) Cyclops python library

Numereous sub-tests and examples can also be build via make, (typically for a file examples/<some_example>.cxx, one can build the code with make <some_example> and similarly for sub-tests). Locally build libraries are stored in the lib, lib_shared, and lib_python subdirectories of the build directory (which may be specified by running the configure script from the desired build directory folder or by specifying ./configure --build-dir=...).

For Apple/MAC-OS machines it may be necessary to set the MACOSX_DEPLYMENT_TARGET environment variable to build the python library, e.g. export MACOSX_DEPLOYMENT_TARGET=10.8.

Installation

The CTF C++ libraries and headers can be installed system-wide (to /usr/local/ by default, change via ./configure --install-dir=...), via

make install #may need superuser permissions (sudo)

If using the default /usr/local/ directory, to make the shared library visible (necessary for python usage of CTF without extra path specifications) you may need to add /usr/local/lib to LD_LIBRARY_PATH (to do this permanently on a standard linux system add LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib to your ~/.bashrc file.

The python library can be installed via pip using

make python_install

See the Makefile in the main directory for additional build targets, or if GNU make is well-configured, use auto-complete to view options via make <TAB>, which currently (June 2018) yields,

algebraic_multigrid       dft                       model_trainer             scalar
all                       dft_3D                    multi_tsr_sym             scan
ao_mo_transf              diag_ctr                  neural_network            shared
apsp                      diag_sym                  nonsq_pgemm_bench         sparse_mp3
bench                     endomorphism              nonsq_pgemm_test          sparse_permuted_slice
bench_contraction         endomorphism_cust         particle_interaction      spectral_element
bench_nosym_transp        endomorphism_cust_sp      permute_multiworld        speye
bench_redistribution      examples                  pip                       spmv
bitonic_sort              executables               python                    sptensor_sum
bivar_function            fast_3mm                  python_base_test          sssp
bivar_transform           fast_as_as_sy_tensor_ctr  python_dot_test           strassen
block_sparse              fast_diagram              python_einsum_test        studies
btwn_central              fast_sy_as_as_tensor_ctr  python_fancyindex_test    subworld_gemm
ccsd                      fast_sym                  python_install            svd
ccsdt_map_test            fast_sym_4D               python_la_test            sy_times_ns
ccsdt_t3_to_t2            fast_tensor_ctr           python_test               test
checkpoint                fft                       python_ufunc_test         test_live
clean                     force_integration         python_uninstall          tests
clean_bin                 force_integration_sparse  qinformatics              test_suite
clean_lib                 gemm_4D                   qr                        trace
clean_obj                 hosvd                     readall_test              uninstall
clean_py                  install                   readwrite_test            univar_function
ctf_ext_objs              jacobi                    recursive_matmul          weigh_4D
ctflib                    matmul                    reduce_bcast              
ctflibso                  mis                       repack                    
ctf_objs                  mis2                      scalapack_tests

Configuring External Libraries

The Cyclops library requires a BLAS library and an MPI library. For Python, Cython (typically available via pip) and numpy are necessary. Optionally, Cyclops can be build alongside LAPACK and ScaLAPACK, providing distributed matrix factorization functionality. Cyclops uses basic MPI routines and is compatible with most MPI libraries (MPICH/OpenMPI). For a static Cyclops build, static versions of these libraries are necessary, for a dynamic (Python) build, dynamic libraries are necessary.

On an Apple machine, all necessary dependencies can be installed via brew

brew install gcc wget cmake openblas mpich2

All library paths and libraries can be provided to the configure files, for example static libraries like /path/to/static_libraries/libmyfastblas.a may be specified as follows,

./configure LIB_PATH="-L/path/to/static_libraries" LIBS="-lmyfastblas -lgfortran"

(-lgfortran may or may not be necessary). If the shared library is in path/to/dynamic_libaries/libmyfastblas.so, the (additional) specification of LD_LIB_PATH and LD_LIBS is required as

./configure LIB_PATH="-L/path/to/dynamic_libraries" LIBS="-lmyfastblas"

Configuring BLAS, LAPACK, and ScaLAPACK libraries

The functionality provided by and performance of Cyclops depends on whether and which of these libraries are provided. MKL routines are used for sparse matrix multiplication, which yield significantly higher performance than the reference kernels. ScaLAPACK functionality enables routines such as QR and SVD to be executed on CTF::Matrix<float>, CTF::Matrix<double>, CTF::Matrix< std::complex<float> >, CTF::Matrix< std::complex<double> >. The presence of necessary symbols is tested when configure is executed (creating flags like -DUSE_MKL and -DUSE_SCALAPACK in config.mk). ScaLAPACK can be downloaded and built (via cmake) automatically via

./configure --build-scalapack

The library can also be build without specifying a ScaLAPACK library, by providing the flag --with-scalapack to ./configure.

Building with MKL on an Intel compiler (e.g. ./configure CXX="mpicxx -cxx=icpc") typically requires simply -mkl (but can be augmented with -mkl=parallel for threading or -mkl=cluster for ScaLAPACK). The configure scripts attempts -mkl when building with an Intel compiler automatically, but it can also be provided as part of CXXFLAGS or LIBS. Custom (e.g. GNU-compiler or parallel+cluster) MKL library link-lines may be provided via LIB_PATH, LIBS, LD_LIB_PATH, LD_LIBS. For example, to build only the dynamic libraries (e.g. for Cyclops Python usage) with GNU compilers and MKL ScaLAPACK on a 64-bit system, the following link-line may be appropriate,

./configure 'LD_LIB_PATH=-L/opt/intel2016/mkl/lib/intel64/' 'LD_LIBS=-lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lmkl_rt -lpthread' '--no-static'

Testing

The C++ static Cyclops library can be tested via

make test #build test_suite and run sequential test
make testN #build test_suite and run via mpirun with N processors
make test_suite && ./bin/test_suite #manually build test_suite and run it

Additional tests (e.g. for ScALAPACK QR and SVD) can be build via make qr, make svd, etc., or in bulk via make examples, make tests, make scalapack_tests.

The Python library may be tested by running

make python_test #run a sequence of test suite sequentially
make python_testN #run the tests with N mpi processes via mpirun python
mpirun python ./test/python/<...>.py #run test <...> manually
make test_live #launch ipython with numpy and ctf pre-imported

Tuning and Benchmarking

Cyclops uses internal performance models to make selections between algorithm variants. These models attempt to predict the performance of every subkernel within Cyclops. The model-coefficients are set to reasonable defaults, but better performance may be achieved by optimizing them for a particular architecture and number of threads per MPI process. To do this, the CTF static library must be build with -DPROFILE -DPMPI -DTUNE (can uncomment appropriate lines in config.mk) and the model_trainer executable should be built via make model_trainer. This executable should then be run for a suitable amount of time on the largest desired amount of nodes (e.g. for an hour on 256 nodes), via, e.g.

export OMP_NUM_THREADS=4; mpirun -np 256 ./bin/model_trainer --time 3600 #run model_trainer for roughly 3600 seconds

The actual execution time of the model_trainer may differ substantially (be smaller or greater) from what is specified by --time. When executed successfully the model_trainer executable outputs a set of model coefficients that can be provided to a subsequent build of CTF for the application (which should no longer use -DTUNE).

Some benchmarks are provided as part of the Cyclops source distribution in bench/ (some examples also include benchmarks). In particular, the bench_contraction target/executable can be used to benchmark an arbitrary sparse/dense/mixed tensor contraction.

Example Configurations

C++ Build with Intel with MKL ScaLAPACK+OpenMP on Stampede-2

The following configuration makes sense for the Stampede cluster with Intel 17.0.4,

module list
  1) intel/17.0.4   2) impi/17.0.3   3) git/2.9.0   4) autotools/1.1   5) python/2.7.13   6) xalt/2.0.7   7) TACC

For a static-only C++ build that uses MKL with both threads (usually -mkl=parallel) and ScaLAPACK (usually -mkl=cluster), which can be tricky to combine,

./configure --no-dynamic LIB_PATH="-L$MKLROOT/lib/intel64/" LIBS="-lmkl_scalapack_lp64 -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64  -Wl,--end-group -lpthread -lm"

For a dynamic-only Python build of CTF that uses sequential MKL, can use the link-line recommended by MKL link-line advisor plus -lmkl_rt to avoid issues with incorrect linkage to static libs. We also set CXXFLAGS="-O3 -no-ipo" to avoid an Intel internal error in compilation,

./configure '--no-static' 'LD_LIB_PATH=-L/opt/intel/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64 -Wl,--no-as-needed' 'LD_LIBS=-lmkl_scalapack_lp64 -lmkl_gf_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -lmkl_def -liomp5 -lpthread -lm -ldl' 'CXXFLAGS=-O2 -no-ipo'

Manual linux dynamic library / Python build with MKL

The use of -lmkl_rt for dynamic library linking seems to have some limitations. On a standard linux configuration, it may link to C++ as opposed to Fortran MKL libraries, which leads to errors in MKL routines for complex types. To fix this, the necessary Fortran MKL libraries need to be specified manually, which also entails other caveats. The following configure specification is valid for a typical linux system configratuon using MKL,

./configure --build-scalapack --build-hptt --no-static LD_LIB_PATH="-L/opt/intel/mkl/lib/intel64/ -Wl,-no-as-needed" LD_LIBS="-lmkl_gf_lp64 -lmkl_intel_thread -lmkl_core -lmkl_def -liomp5" LDFLAGS="-lpthread -lm -ldl -lgfortran"

after running configure in this way, run make python to build the python libs, and/or make python_test2 to test them with a 2 process execution.