Skip to content

MCMC sampling algorithms for linear inverse models in R

License

Notifications You must be signed in to change notification settings

pregnault/samplelim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

samplelim : MCMC sampling algorithms for linear inverse models in R

Project Status: Active – The project has reached a stable, usable state and is being actively developed. License: LGPL v3 CRAN_Status_Badge

The package {samplelim} for the R statistical software provides efficient implementations (C++ encoded) of Monte Carlo Markov Chains (MCMC) algorithms for uniformly sampling high-dimensional polytopes. It is particularly aimed at handling linear inverse models (LIM) in metabolic (trophic, biochemical or urban) networks. Some support functions are also included to facilitate its use by practitioners.

Objective

{samplelim} provides efficient implementations of two MCMC algorithms for sampling high-dimensional polytopes, the Mirror Walk (MiW) introduced by Van Oevelen et al. (2010) and the Billard Walk (BiW) introduced by Polyak and Gryazina (2014). Thanks to the inclusion of easy to use support functions for linear inverse modeling of metabolic networks, {samplelim} can be viewed as an updated, extended and low-level-encoded version of the R package {limsolve}.

{samplelim} is built upon the C++ library {volesti}: the source code of the R package {volesti} 1.1.2-6 has been forked from its GitHub repository as a basis for developing {samplelim}.

The C++ library {volesti} provides efficient implementations of different MCMC algorithms for sampling high-dimensional polytopes and estimating their volumes. Its R package counterpart proposes a subset of these algorithms among which the BiW.

{samplelim} aims at combining the performance of {volesti} and the convenient features of {limsolve}. Precisely :

  • the MiW is implemented. It is an optimized, C++ encoded version of the algorithm coded in pure R programming language in {limsolve};
  • {samplelim} includes and slightly modifies the BiW from {volesti} 1.1.2-6 – the uniform distribution of the path length in {volesti} is replaced by the exponential distribution, as originally suggested by Polyak and Gryazina (2014);
  • support functions allowing an easy handling by users are included. These functions are updated/modified versions of the ones present in {limsolve}.

Installation

The package {samplelim} is not yet available on CRAN mirrors. Its source code is available on GitHub. It can be installed from its repository by executing the following chunk in any R console.

# Install package remotes if not already installed
if (! "remotes" %in% installed.packages()[,"Package"]) {install.packages("remotes")}
# Install package samplelim from its GitHub repo
remotes::install_github("https://github.com/pregnault/samplelim")

Typical workflow

The workflow of {samplelim} is greatly inspired by the one of {limsolve}. The main front-end function of {samplelim} is rlim(), performing uniform sampling in the polytope associated to LIM, by means of MCMC algorithm. Its main, mandatory, argument is lim, a list or an object of class lim (introduced in {limsolve}) encompassing the description of the polytope to be sampled. This list or lim object can be defined manually or, more suitably, from a description file, as illustrated in the following chunk.

# Load package
library("samplelim")
# Find path to the example of declaration file (DF) included in samplelim
DF <- system.file("extdata", "DeclarationFileBOWF-short.txt", package = "samplelim")
# Read DF and create a lim object
BOWF <- df2lim(DF)

Then, sampling is performed by a simple call to the function rlim(), as follows.

sample <- rlim(lim = BOWF, 
               seed = 123, # Set the seed of PRNG
               nsamp = 5000) # Number of points in the returned sample
# The points are presented in an N*n matrix, where
# N is the number of sampled points (here, 5000, default = 3000)
# n is the number of flows (ambiant space of the polytope)
dim(sample)
## [1] 5000   28

Diagnostics on sampling performances can then be performed using package {coda}, e.g., the Raftery and Lewis diagnostics, as illustrated below.

coda::raftery.diag(data = sample)
## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95 
##                                        
##  Burn-in  Total Lower bound  Dependence
##  (M)      (N)   (Nmin)       factor (I)
##  15       19070 3746         5.090     
##  2        3803  3746         1.020     
##  2        3741  3746         0.999     
##  2        3803  3746         1.020     
##  16       18132 3746         4.840     
##  12       13308 3746         3.550     
##  16       17556 3746         4.690     
##  2        3741  3746         0.999     
##  2        3866  3746         1.030     
##  2        3680  3746         0.982     
##  2        3620  3746         0.966     
##  8        9730  3746         2.600     
##  8        10664 3746         2.850     
##  9        12927 3746         3.450     
##  2        3680  3746         0.982     
##  2        3803  3746         1.020     
##  2        3803  3746         1.020     
##  2        3680  3746         0.982     
##  2        3680  3746         0.982     
##  2        3930  3746         1.050     
##  2        3741  3746         0.999     
##  2        3680  3746         0.982     
##  2        3866  3746         1.030     
##  2        3803  3746         1.020     
##  2        3803  3746         1.020     
##  2        3620  3746         0.966     
##  2        3680  3746         0.982     
##  2        3866  3746         1.030

A complete user guide will soon be included in the package, in the form of a vignette.

A comparison study of computation time and sampling quality, between the implementations of the MiW of {samplelim}, the BiW of {volesti} (1.1.2-6) and the Coordinate Hit-and-Run with Rounding (CHRR) of the MatLab library {COBRA} is available in Girardin et al. (2024).

Credits

The modifications of the core C++ implementation of the BiW and the C++ implementation of the MiW, have been performed by Matthieu DIEN and Théo GRENTE.

The R packaging has been performed by Théo GRENTE and Philippe REGNAULT.

The Declaration File in inst/extdata has been produced by Quentin NOGUÈS; see Noguès at al. (2020) for details on the ecological network it relies on.

We refer to the credits.md file of the R package {volesti} for additional information on the development of the core C++ implementation of the BiW (and other algorithms implemented in {volesti}).

The function df2lim() (reading and formatting a declaration file) is a wrapped copy of functions Read() and Setup() from package {LIM}, developped by Karline SOETAERT. It has been added to the present package to limit its dependency tree.

Licensing

You may redistribute or modify the software under the GNU Lesser General Public License as published by Free Software Foundation, either version 3 of the License, or any later version. It is distributed in the hope that it may be helpful to the community, but WITHOUT ANY WARRANTY.

References

V. Girardin, T. Grente, N. Niquil and P. Regnault, Comparing and updating R packages using MCMC Algorithms for Linear Inverse Modeling of Metabolic Networks, hal-04455831 (2024).

Q. Noguès, A. Raoux, E. Araignous, T. Hattab, B. Leroy, F. Ben Rais Lasram, F. Le Loc’h, J. Dauvin and N. Niquil, Cumulative effects of marine renewable energy and climate change on ecosystem properties: Sensitivity of ecological network analysis, Ecological Indicators 121, 107128 (2020).

L. Cales, A. Chalkis, I.Z. Emiris and V. Fisikopoulos, Practical volume computation of structured convex bodies, and an application to modeling portfolio dependencies and financial crises, Proc. of Symposium on Computational Geometry, Budapest, Hungary (2018).

B.T. Polyak and E.N. Gryazina, Billiard walk - a new sampling algorithm for control and optimization, IFAC Proceedings Volumes, 47(3), 6123-6128 (2014).

D. Van Oevelen, K. Van den Meersche, F. J. R. Meysman, K. Soetaert, J. J. Middelburg and A. F. Vézina, Quantifying Food Web Flows Using Linear Inverse Models, Ecosystems 13, 32-45 (2010).