Skip to content
/ scr Public
forked from LLNL/scr

SCR caches checkpoint data in storage on the compute nodes of a Linux cluster to provide a fast, scalable checkpoint / restart capability for MPI codes.

License

Notifications You must be signed in to change notification settings

tonyhutter/scr

 
 

Repository files navigation

Scalable Checkpoint / Restart (SCR) Library

The Scalable Checkpoint / Restart (SCR) library enables MPI applications to utilize distributed storage on Linux clusters to attain high file I/O bandwidth for checkpointing and restarting large-scale jobs. With SCR, jobs run more efficiently, recompute less work upon a failure, and reduce load on critical shared resources such as the parallel file system.

Detailed usage is provided at SCR.ReadTheDocs.io.

User Docs Status

Contribute

As an open source project, we welcome contributions via pull requests, as well as questions, feature requests, or bug reports via issues. Please refer to both our code of conduct and our contributing guidelines.

Developers

Developer documentation is provided at SCR-dev.ReadTheDocs.io.

Developer Docs Status

SCR uses components from ECP-VeloC, which have user and developer docs.

Authors

Numerous people have contributed to the SCR project.

To reference SCR in a publication, please cite the following paper:

Additional information and research publications can be found here:

http://computation.llnl.gov/projects/scalable-checkpoint-restart-for-mpi

About

SCR caches checkpoint data in storage on the compute nodes of a Linux cluster to provide a fast, scalable checkpoint / restart capability for MPI codes.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C 64.4%
  • Perl 12.6%
  • Python 12.4%
  • Shell 7.7%
  • CMake 2.5%
  • C++ 0.4%