/
readme.Rmd
46 lines (33 loc) · 2.78 KB
/
readme.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
---
title: "BioTIMEx"
author: "Alban Sagouis"
date: "7/22/2020"
output: md_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## BioTIMEx
### Description
This research compendium regroups scripts used to download, re-structure and aggregate data sets to constitute a large meta-analysis of communities in experimental setups sampled several times.
The code found here was originally versionned using git and stored on github <>, and was eventually submitted to Zenodo <>. This code accompanies the article: XXXXX.
### Reproducibility and R environment
To ensure that the working environment (R version and package version) are documented and isolated, the package renv (<https://rstudio.github.io/renv/index.html>) was used. By running `renv::restore()`, renv will install all missing packages at once. This function will use the `renv.lock` file to download the same versions of packages that we used.
### Methods
Data sets were originally searched for among LTER data sets and suitable open access data stored on EPI were selected (<https://portal.edirepository.org/nis/home.jsp>).
Suitable data sets were individually downloaded from R. Scripts managing these downloads are grouped inside `R/data download/`. These scripts follow EDI process of data checking and formatting. You can run all these scripts at once by running this command here or from `R/1.0_downloading_raw_data.r`:
```{r eval=FALSE}
if(!dir.exists('data/raw data/')) dir.create('data/raw data/')
listF <- list.files('./R/data download', pattern = ".R|.r", full.names = TRUE)
lapply(listF, function(fullPath) source(fullPath, encoding = 'UTF-8', echo = FALSE, local = TRUE))
```
All downloaded data sets are saved in separate folders named following the convention `author_year`.
In a second step, each data set is re-structured or wrangled to fit a common format before analysis. The scripts turning the original heterogeneously structured data sets into comparable tables are in the `./R/data wrangling/` folder. You can run all these scripts at once by running this command here or from `R/2.0-wrangling_raw_data.r`:
```{r eval=FALSE}
if(!dir.exists('data/wrangled data/')) dir.create('data/wrangled data/')
listF <- list.files('R/data wrangling', pattern = ".R|.r", full.names = TRUE)
lapply(listF, function(fullPath) source(fullPath, encoding = 'UTF-8', echo = FALSE, local = TRUE))
```
Finally, all restructured tables are aggregated together in a final table by the `.R/3.0_merging_long-format_tables.r` script. The structure of the end-product table is a long format with each row recording the composition of a community in one place at a given time. Format is described in `./data/template long format.txt` and variables are defined.
### Analyses
Further analyses were carried at on R too by Shane Blowes and collaborators.