Skip to content

bcgsc/btllib

Repository files navigation

GitHub release (latest by date) Conda Build Status DOI

Bioinformatics Technology Lab common code library in C++ with Python wrappers.

Platforms

  • Linux
  • MacOS

Installation for users

The recommended way is to download using Conda package manager:
conda install -c bioconda -c conda-forge btllib

Alternatively, you can compile the code from source. Download btllib-$VERSION.tar.gz from the GitHub latest release where $VERSION is the latest btllib version and do the following:

  • tar xzf btllib-$VERSION.tar.gz to extract the source code.
  • Have the dependencies ready:
    • GCC 6+ or Clang 5+ (with OpenMP and C++17 support)
    • Python 3.9+
    • Meson and Ninja Python3 packages, CMake (If not available, these will be automatically installed to a temporary directory.)
  • Run btllib/compile
    • This will install btllib in the btllib/install directory. You can provide the --prefix parameter to change this.
    • The C++ compiler must be the same as the one used for compiling Python. E.g. if you installed Python using a package manager, you should use the C++ compiler from the same package manager. You can change the compiler by exporting the CXX environment variable to point to the compiler before running btllib/compile.
    • You can optionally run python3 -m pip install $PREFIX/lib/btllib/python afterwards to install the Python package. The Python wrappers are usable even without this step. $PREFIX is the path where btllib is installed.

Using the library

  • Run time dependencies:
    • SAMtools for reading SAM, BAM, and CRAM files.
    • gzip, tar, pigz, bzip2, xz, lrzip, zip, and/or 7zip for compressing/decompressing files. Not all of these are necessary, only the ones whose compressions you'll be using.
    • wget for downloading sequences from a URL.
  • Building C++ code ($PREFIX is the path where btllib is installed):
    • Link your code with $PREFIX/lib/libbtllib.a (pass -L $PREFIX/lib -l btllib flags to the compiler).
      • You can do so by typing the following in your console:
        • export CPPFLAGS="-isystem /path/to/btllib/install/include $CPPFLAGS"
        • export LDFLAGS="-L/path/to/btllib/install//lib -lbtllib $LDFLAGS"
    • #include any header from the $PREFIX/include directory (pass -I $PREFIX/include flag to the compiler).
    • btllib uses C++11 features, so that standard should be enabled at a minimum.
  • Running Python code:
    • The Python used to import btllib must be the same as the one used to compile the library. Specifically, btllib uses python3-config to determine the flags used for compilation. Running python3-config --exec-prefix will give the path to the Python installation that needs to be used. The python3 executable can be found at $(python3-config --exec-prefix)/bin/python3.
    • The wrappers correspond one-to-one with C++ code so any functions and classes can be used under the same name. The only exceptions are nested classes which are prefixed with outer class name (e.g. btllib::SeqReader::Flag in C++ versus btllib.SeqReaderFlag in Python), and (Kmer)CountingBloomFilter which provides CountingBloomFilter8, CountingBloomFilter16, CountingBloomFilter32, KmerCountingBloomFilter8, KmerCountingBloomFilter16, CountingBloomFilter32 with counters 8, 16, and 32 bits wide.
    • If you compiled btllib from source code and didn't install the Python wrappers, you can use PYTHONPATH environment variable or sys.path.append() in your Python code to include $PREFIX/lib/btllib/python/btllib directory to make btllib available to the interpreter.
    • Include the library with import btllib
  • Executables
    • btllib generated executables can be found in $PREFIX/bin directory. Append that path to the PATH environment variable to make it available to your shell.

Documentation

Docs page

For btllib developers

  • Initial setup:
    • git clone --recurse-submodules https://github.com/bcgsc/btllib in order to obtain all the code.
    • In btllib dir, run meson build to create a build directory.
  • Every time you want to run tests, in the build dir:
    • ninja wrap to regenerate wrappers.
    • ninja test to build wrappers and tests, and run tests.
  • Before making a pull request, in the build dir:
    • ninja quality-assurance to make sure all CI tests pass.
    • Make a commit after the above step, in case it has made any changes to wrappers or formatting. Don't commit the changes made to the sdsl-lite subproject. Meson config file adjusts the sdsl-lite config in order for it to work for btllib, but this is done ad hoc and is not necessary to be committed. By doing it ad hoc we keep a list of differences compared to the upstream repository.
  • Before making a release, in the build dir:
    • Do the same as for a pull request and
    • ninja docs to regenerate docs to reflect the release and then commit the changes.
    • meson dist --allow-dirty to generate a self-contained package based on the last commit. --allow-dirty permits making a distributable with uncommited changes. This is necessary as sdsl-lite dependency has ad hoc changes made during the build process. The resulting distributable will be compressed with xz. For easier use, decompress it and then compress with gzip. Attach the resulting file to the release.

The following are all the available ninja commands which can be run within build directory:

  • ninja clang-format formats the whitespace in code (requires clang-format 8+).
  • ninja wrap wraps C++ code for Python (requires SWIG 4.0+).
  • ninja clang-tidy runs clang-tidy on C++ code and makes sure it passes (requires clang-tidy 8+).
  • ninja builds the tests and wrapper libraries / makes sure they compile.
  • ninja test runs the tests.
  • ninja code-coverage assures code coverage threshold is satisfied. (requires gcovr 3.3+)
  • ninja sanitize-undefined runs undefined sanitization.
  • ninja test-wrappers tests whether wrappers work.
  • ninja docs generates code documentation from comments (requires Doxygen).
  • ninja quality-assurance runs clang-format, wrap, clang-tidy, test, code-coverage, sanitize-undefined, and test-wrappers. These are all checked at the CI test.

Credits

Citing

If you use btllib in your research, please cite:

Nikolić et al., (2022). btllib: A C++ library with Python interface for efficient genomic sequence processing. Journal of Open Source Software, 7(79), 4720, https://doi.org/10.21105/joss.04720

If you use aaHash in your research, please cite:

Wong et al., (2023). aaHash: recursive amino acid sequence hashing. Bioinformatics Advances, vbad162, https://doi.org/10.1093/bioadv/vbad162.