Skip to content

Releases: leobago/fti

Irun

25 Feb 12:02
Compare
Choose a tag to compare
  • Released Spack package (https://github.com/leobago/fti-spack)
  • Asynchronous HDF5 single file creation
  • New installation folder structure (includes config file template, documentation, examples, etc)
  • CI revision (bug fixes, new structure, compiler updates, added MPICH)
  • Improved CPP interoperability (included into CI)
  • Improved installation script (install.sh)
  • Revised coding style constrains
  • Minor patches (dcp, macros, testing)

Rabat

13 Oct 13:37
c1d5c0d
Compare
Choose a tag to compare

This release breaks compatibility with a few FTI functions for handling data types. Please check the documentation for details:
https://fault-tolerance-interface.readthedocs.io/en/latest/compatibility_notes.html

  • Added support for defining non-opaque composite data types in Fortran when using HDF5;
  • Added native and non-opaque bindings for Fortran Complex data type when using HDF5;
  • Simplified the composite data type handling C API;
  • Enhanced CMake configuration exports to include linked libraries and compiler definitions;
  • Enhancement of the pre-commit hook including bug fixes.
  • Implementation of Fast-Forward feature to allow checkpoints to be taken at sub-minute magnitude.
  • Implementation of a checkpoint processor for reading and decoding FTI's checkpoints. Newer features are implemented including hdf5 write, N-dimentional variable support, last-checkpoint processing, and usage examples.
  • New API function FTI_SetAttribute that allows adding descriptive attributes to the protected datasets

Florianopolis

09 Apr 13:01
d0bc01b
Compare
Choose a tag to compare
  • GetConfiguration feature: follow-up of issue #250
  • Register a virtually unlimited number of protections with FTI_Protect
  • Asynchronous postprocessing of shared HDF5 checkpoint file
  • Enhancement of the ICR feature by separating variable recovery from the I/O tasks for all I/O modes
  • IME I/O interface (IME native API)
  • Refactoring of FTI-I/O interfaces (HDF5, SIONlib, MPI-I/O, FTI-FF)
  • Addition of the FTI Integrated Test Framework (ITF), a tool to develop FTI Integration tests
  • Complete refactor of local integration tests behavior into the new ITF format
  • Removal of deprecated files in the local tests directory
  • Integration of ITF into CMake test runner tool (CTest) and Jenkins
  • Expanded CI tests with local tests for all I/O libraries
  • Simplified runtime metadata handling in FTI (for FTI developers)
  • Implementation of key-value storage for protected variables (for FTI developers)
  • Documentation for how to use, develop tests and expand ITF added to the developer guide section
  • Configuration of ReadTheDocs for an auto-build of the documentation from Doxygen's resources using Breathe and Sphinx

Heraklion

10 Jul 15:26
bae938b
Compare
Choose a tag to compare

Release

This release includes the new version of differential checkpointing, a complete implementation of incremental checkpointing, full support for GPU checkpointing and full support for HDF5 checkpointing, including the option for checkpointing into a single file (N-1) and restarting with a different number of processes.

Changelog

  • New major feature allowing users to checkpoint data allocated in the GPU device memory.
  • New implementation of differential checkpointing that addresses performance issues for highly fragmented differential updates.
  • New major feature allowing users to use incremental checkpointing for CPU and GPU data by adding one by one the variables to the checkpoint file.
  • New major feature for DCPPosix allowing to recover from last non-corrupted checkpoint file.
  • New examples in the examples/GPU directory that checkpoint GPU data.
  • New major feature allowing to restart with a different number of processes using a shared HDF5 checkpoint file.
  • New unitary tests for the new features.
  • New configurable/flexible local test structure.
  • Fixed Bug of RecoverVar.
  • Fixed Bug on DCP recovery.
  • Complete and full code documentation generated with Doxygen.

Cologne

02 Oct 14:49
7bce275
Compare
Choose a tag to compare
  • Fix for bug to find MPI with PGI compilers.
  • New option to avoid killing head processes and let the user handle it.
  • New major feature allowing users to use differential checkpointing including a new FTI file format.
  • New major feature allowing users to keep ALL L4 checkpoints.
  • New major feature allowing users to leverage FTI for general asynchronous I/O outside of checkpointing.
  • New unitary tests for the new features and a few more with improved performance for the previous ones.
  • Complete and full code documentation generated with Doxygen.
  • Vastly improved User guide showcasing the new features of this release.

Coruna

29 May 14:13
dee3556
Compare
Choose a tag to compare
  • Fixed corner case bug when a specific failure type happens just after a restart.
  • Fixed some unitary tests that were launched with the wrong arguments.
  • Fixed bug on FTI File Format generating a segmentation fault.
  • Implementation of checkpointing in HDF5 format.
  • Support for HDF5 groups added to allow more flexibility.
  • Additional unitary tests added to check HDF5 support.
  • Extended wiki, documentation and user guide with HDF5 and FFF information.

Barcelona

06 Nov 14:15
Compare
Choose a tag to compare
  • Switch to Jerasure 2.0

  • Support for checkpoints that dynamically change size

  • FTI_Realloc and FTI_GetStoredSize added to support evolving ckpt sizes

  • Added support for partial local recovery with FTI_RecoverVar

  • Feature to set sync time for applications with evolving iteration length

  • Fix for examples bug with Intel compilers

  • Fix for bug on Cray machines

  • Added warning when using different compilers

  • Add more unitary tests

  • Wiki pages added on Github

  • Fortran API added in the wiki and user guide

  • Improved documentation

  • Switch to BSD License

Poznan

13 Jul 15:43
Compare
Choose a tag to compare
  • I/O interfaces for MPI-IO and SIONlib
  • Checking checkpoint integrity with MD5 checksums
  • Fixed bug for L2 and L3 checkpointing
  • Fixed synchronization bugs
  • Added automated testing with different compilers
  • Expanded examples
  • Fixed memory leaks and removed unused variables
  • Local unitary tests in C and fortran
  • Improved code formatting
  • Updated developer documentation

Paris

12 Jan 15:23
Compare
Choose a tag to compare
  • Fixed undefined behaviour while using snprintf function.
  • Fixed bug when no checkpoints are taken if mean iteration time is bigger than 30 sec.
  • Fixed while validating configuration for Level 2 and 3.
  • Fixed bug when asked to keep last checkpoint if no checkpoint was taken during execution.
  • Fixed bug of the example application crashes during recovery.
  • Fixed bug when using of rint requires the math library at link.
  • Added Cray Compiler names to FindMPI script
  • Fixed bug for files larger than 2Gb (int -> long).
  • Writing full datasets harder.
  • Better documentation and NEW user guide!

Chicago

14 Jun 11:11
Compare
Choose a tag to compare

Here is the list of changes for this Release

  • Fixed incorrect warnings for no-head ranks if using dedicated processes.
  • Fixed problem with renaming/erasing local files if using dedicated processes.
  • Fixed problem with creating files/directories which already exist and deleting files/directories which don't exist.
  • Moved some global variables into static.
  • Removed some unused variables or code.
  • Fixed buffer overflow in jerasure library.
  • Fixed some not null terminated string bugs.
  • Fixed many unchecked or ignored return value form standard library functions.
  • Fixed some uninitialized variable problems.
  • Corrected some misleading warning/error messages.
  • Fixed many resource leaks.
  • Fixed many potential memory leaks.
  • Fixed many TOCTOU problems.
  • Added header file with declarations of all library functions.
  • Added option to build examples.
  • Added cmake files to build dependencies and examples.
  • Cleaned library interface.