Skip to content

vlmarkov/Fault-Tolerance-Library

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MPI Fault Tolerance Library

Feature list

  • GNU/Linux
  • Unit test (cxxtest framework)

Test Samples

User-level checkpoint library

  • Rollback recovery - checkpoint/restart based
  • Failure detection - ULFM based
  • Snapshot creation - hard drive based (in place/via NFS)
  • Incremental chekpointing - delta encoding based (XOR operation)
  • Aditional compress procedure - zlib based

ULFM

  • Survivability
  • Fault-tollerance
  • Compute redundancy

WIP

  • Implementing alternative recovery fault tolerance methods
  • Expanding test sample base
  • Reducing overhead
  • Improving impementation

This project has been implemented as a part of my graduate thesis in Computing Systems department of Siberian State University of Telecommunications and Information Scienses.

  • Graduate student: Vladislav Markov
  • Supervisor: Mikhail Kurnosov