Skip to content

Archivists

Clifford Bohm edited this page Jan 16, 2018 · 15 revisions

Archivists collect, maintain, record and retrieve data. Different types of Archivist record data in different formats and levels of granularity.

Click here for information relating to the contents of output files.

what data does the Archivist record

DataMap
Archivists save data stored in organisms DataMaps. Any data added to a DataMap will by default be recorded.
non-DataMap
Archivists have special rules for recording other types of objects. These are automatically configured by the particular objects in question(i.e. genomes have their own output rules which are defined by the type of genome you are using).

when does the Archivist record data?

Archivists use sequences (e.g. dataSequence, genomeSequence) to determine how often data is recorded. If an interval was set to :10 then data would be recorded every 10 updates (0,10,20,...). Some Archivists delay output until some condition is reached. In these archivists, intervals still determine the number of samples to be recorded and when in "update time" these samples will be recorded but may delay the actual time when the data is saved.

selecting an Archivist

Archivists are selected in the settings files.

% ARCHIVIST
  outputMethod = Default                   # (string) output method, [Default, LODwAP (Line of Decent with Aggressive Pruning), SSwD (SnapShot with Delay)]

Default Archivist

The Default Archivist records pop.csv and max.csv.

The default Archivist can also be setup to save snapshotData and snapshotGenome files. A snapshot stores data for the entire population as it exists at a given interval. snapshotData files also contain linage data which can be used to graph phylogenies.
% ARCHIVIST_DEFAULT
  maxFileName = max.csv                      # (string) name of max file (saves data on organism with max "score" as determined by Optimizer)
  popFileColumns = []                        # (string) data to be saved into average file (must be values that can generate an average). If empty, MABE will try to figure it out
  popFileName = pop.csv                      # (string) name of population data file (saves population averages)
  realtimeSequence = :10                     # (string) How often to write to realtime data files. (format: x = single value, x-y = x to y, x-y:z = x to y on x, :z = from 0 to
                                             #    updates on z, x:z = from x to 'updates' on z) e.g. '1-100:10, 200, 300:100'
  snapshotDataFilePrefix = snapshotData      # (string) prefix for name of snapshot data file
  snapshotDataSequence = :100                # (string) How often to save a realtime snapshot data file. (format: x = single value, x-y = x to y, x-y:z = x to y on x, :z = from 0
                                             #    to updates on z, x:z = from x to 'updates' on z) e.g. '1-100:10, 200, 300:100'
  snapshotOrganismFilePrefix = snapshotOrganisms # (string) prefix for name of snapshot organism file
  snapshotOrganismSequence = :1000           # (string) How often to save a realtime snapshot genome file. (format: x = single value, x-y = x to y, x-y:z = x to y on x, :z = from
                                             #    0 to updates on z, x:z = from x to 'updates' on z) e.g. '1-100:10, 200, 300:100'
  writeMaxFile = 1                           # (bool) Save data to Max file?
  writePopFile = 1                           # (bool) Save data to average file?
  writeSnapshotDataFiles = 0                 # (bool) if true, snapshot data files will be written (with all non genome data for entire population)
  writeSnapshotOrganismsFiles = 0            # (bool) if true, snapshot organism files will be written (with all organisms for entire population)

other Archivist Types

Both "Line of Decent with Aggressive Pruning" (LODwAP) and "Snapshot with Delay" (SSwD) Archivists extend the Default Archivist, so even if you are using one of these archivists, you can still generate pop, max, snapshotData and snapshotGenome files.

Line Of Descent with Aggressive Pruning (LODwAP)

LODwAP extends the Default Archivist (i.e. all of the Default Archivist parameters are still in use).

Line of descent defines the kinship relation between an individual and the individual's progenitors. Given an organism, the line of descent is the list containing that organisms parent and their parents' parent, and their parents' parents' parent... etc. LODwAP saves data for organisms on the Line of Descent. To save memory, LODwAP periodically checks to see if there has been coalescence (that is, is there an unsaved organism in the line of decsent with only one offspring on LOD), and outputs any information up to the coalescence (after which LODwAP can erase the now unneeded data from memory).

LOD can not be used with sexual reproduction as a simple Line of Descent can not be computed when organisms have more than one parent.

If speciation occurs in a LODwAP run this will not be visible in the output files. One species LOD will be visible and all others will be lost. In addition, if speciation occurs in a long run, it may cause memory issues (because coalescence will not occur, data can not be saved to disk and deleted from memory!)

In addition to the files generated by the default archivist, LODwAP also saves:
LOD_data.csv
contains data for organisms on the line of descent. LOD_data.csv will contain one line for each update being recorded (as defined by dataSequence)
LOD_organism.csv
contains data (any data related to genomes, brains which can not be found in parameters) for organisms on the line of descent. LOD_organism.csv will contain one line for each update being recorded (as defined by genome_interval)
terminate after - With LODwAP MABE will continue running even after you reach updates (the parameter the tells MABE when to stop running) until there is coalescence at updates. (i.e. a single most recent common ancestor for the current population). Since coalescence can take a long time, the parameter terminate_after defines how long you are willing to wait for coalescence. once updates + terminateafter is reached MABE will stop running. Any LOD based files that have not yet been written will be written assuming that a random organism in the current population is the most recent common ancestor.
% ARCHIVIST_LODWAP
  dataFileName = LOD_data.csv                # (string) name of genome file (stores genomes for line of decent)
  dataSequence = :100                        # (string) How often to write to data file. (format: x = single value, x-y = x to y, x-y:z = x to y on x, :z = from 0 to updates on z,
                                             #    x:z = from x to 'updates' on z) e.g. '1-100:10, 200, 300:100'
  genomeFileName = LOD_organism.csv          # (string) name of data file (stores everything but genomes)
  organismSequence = :1000                   # (string) How often to write genome file. (format: x = single value, x-y = x to y, x-y:z = x to y on x, :z = from 0 to updates on z,
                                             #    x:z = from x to 'updates' on z) e.g. '1-100:10, 200, 300:100'
  pruneInterval = 100                        # (int) How often to attempt to prune LOD and actually write out to files
  terminateAfter = 100                       # (int) how long to run after updates (to get allow time for coalescence)
  writeDataFile = 1                          # (bool) if true, a data file will be written
  writeOrganismFile = 1                      # (bool) if true, a organism file will be written

Snapshot with Delay (SSwD)

SSwD extends the Default Archivist (i.e. all of the Default Archivist parameters are still in use).

The Snapshot with Delay Archivist generates the same type of data found in the default Snapshot files(snapshotData_[update].csv and snapshotGenome_[update].csv), but before writing data and genome files, some number of updates are allowed to run (the delay). Only organisms who have surviving offspring after the delay are saved to file.

The SSwD Archivist provides pruned ancestry, which can be used to generate pruned phylogenies. This pruning process removes short branches from the phylogeny which are generally not interesting and also take up a lot of disk space.
% ARCHIVIST_SSWD
  cleanupInterval = 100                      # (int) How often to cleanup old checkpoints
  dataDelay = 10                             # (int) when using Snap Shot with Delay output Method, how long is the delay before saving data
  dataFilePrefix = SSwD_data                 # (string) name of genome file (stores genomes)
  dataSequence = :100                        # (string) when to save a data file (format: x = single value, x-y = x to y, x-y:z = x to y on x, :z = from 0 to updates on z, x:z =
                                             #    from x to 'updates' on z) e.g. '1-100:10, 200, 300:100'
  organismDelay = 10                         # (int) when using Snap Shot with Delay output Method, how long is the delay before saving organisms
  organismFilePrefix = SSwD_organism         # (string) name of data file (stores everything but organism file data)
  organismSequence = :1000                   # (string) when to save a organism file (format: x = single value, x-y = x to y, x-y:z = x to y on x, :z = from 0 to updates on z, x:z
                                             #    = from x to 'updates' on z) e.g. '1-100:10, 200, 300:100'
  writeDataFiles = 1                         # (bool) if true, data files will be written
  writeOrganismFiles = 1                     # (bool) if true, genome files will be written

Which Archivist should you use?!

Default
If you are only interested in basic information and/or if you want snapshots with all data.
LODwAP
If you are only interested in the "best" evolved organism and their ancestors and reproduction is asexual
SSwD
If you are interested in being able to analyse phylogeny and/or diversity, and want a good general idea about what's happening, but don't want to deal with absolutely all of the data!
Clone this wiki locally