Skip to content
Sean Anderson edited this page May 14, 2013 · 35 revisions

Home

Folder structure

One option

The basic folder structure could look like this:

paper1 -> species1 -> om1-em1-note -> om
                                   -> em

where paper1 refers to a paper ID (one of m, retro, data) and species1 refers to the species ID (one of cod, flat, sardine), and om1-em1-note refers to a scenario. om1 refers to the operating model ID (whatever you want to name it), em1 refers to the estimation model ID (whatever you want to name it), and note refers to anything you want to append to the folder name for reference. The operating and estimation model IDs should match their respective folder names. om and em will always be named the same and will contain the operating and estimation model files. This folder structure will be created with the create_dirs() function. It's important we have this absolutely consistent folder naming structure so that future functions that work with the output can easily traverse the folders.

Another option

An alternative, perhaps simpler structure would be like this:

sc1-cod/
sc1-fla/
sc1-sar/
sc2-cod/
sc2-fla/
sc2-sar/

The bit before the - represents the unique scenario identifier. It could be more descriptive than sc1. The bit after represents the species. Within each scenario folder you'd have the 100 replicates and an operating and estimation model for each:

sc1-cod/1/om
sc1-cod/1/em
sc1-cod/2/om
sc1-cod/2/em

The big advantage to this setup is:

  1. it makes it easier for multiple papers to share scenarios

  2. it makes it easier for papers to change which scenarios to compare after

  3. it avoids necessary nested folder structure

  4. it's easier to distribute the model runs across people and computers

  5. since each folder represents a unique scenario run, it's simple to keep track of progress on model runs in a spreadsheet

Cole suggested we have a spreadsheet with the following columns:

scenario ID, what it means, how the control file was modified, model run

Then, groups can compile a list of scenario IDs they want to extract and compare.

Working directories and slashes

I'd suggest that to make the code user agnostic we assume that the user has set their R working directory first and that the R working directory is the folder that contains the paper folders. Otherwise different people won't be able to run the same code without modifying the absolute paths.

I think we should also decide to use forward slashes in folder references since these are operating system agnostic.

Documenting how simulations were run

There will be a .r file in each paper folder that contains all calls to the wrapper function to run the simulation. It should be theoretically possible (although it would take a long time) to run the whole simulation again by sourcing the .r file. In general, the expectation is to call the wrapper function once per scenario. To facilitate distributed computing among multiple people and computers, there may be multiple function calls in the main .r folder with each running different simulation runs. In the end, this will mimic the concept of having a plain text control file, but avoid us having to invent a new file format, and error check the input. We'll be able to work directly within the R framework.

What the "wrapper" function will do

The wrapper function will take care of:

  1. moving operating and estimating models to the correct folders

  2. running the operating model

  3. manipulating data as needed

  4. renaming files and file extensions as needed

  5. running the estimation model

Generating the operating and estimating models

The wrapper function will take as input an operating folder and estimating folder. It is therefore the responsibility of the user to have these generated beforehand. We went this route to make the wrapper function as generalizable as possible. We will have examples of how the R functions can help generate these folders in a vignette.

Recruitment deviations

The recruitment deviations will be in a 100x100 matrix with the columns representing different simulation runs and the rows representing years 1:100. These will be stored in a .rda file in the R package. They can then be easily loaded with data(recdevs) or we can have them loaded automatically with lazyload in the package description. Sean will take care of this.

The recruitment deviations will be generated as N(0, 1) and will be scaled by each operating model. Kelli will write a function to find the desired standard deviation and scale the data.