EMu-IPT-Prep

Tools to prep EMu data for IPT

These scripts are part of the FMNH workflow for publishing specimen data from EMu to the Field Museum IPT. Information on how to structure data/reports from EMu is at the top of each script.

IPTac.R

prepares EMu Catalogue and Multimedia data as an Audiovisual Core extension for multimedia associated with occurrences.
IPTac_v1.R is an older version (pre-CT scan data) for preparing Multimedia with a simpler record structure

IPTcd.R

prepares a [draft] Collections Description dataset for inventories, accessions, or other data not yet resolved to 'occurrence-level' specificity.

IPTdwc.R

includes checks to help prepare EMu Catalogue data as a Darwin Core dataset.
converts EMu's DarDateLastModified values to the proper ISO time format for dwc:modified.
converts EMu's ColDateCollectedFrom (or DarYear, DarMonth, DayDay) to ISO format for dwc:eventDate.
replaces carriage-returns with pipes within all fields.
checks for duplicate GUIDs and if any are found, outputs a CSV of duplicates to check.

IPTrr.R

prepares EMu Catalogue data from the Relationship Tab (AllRelNhTab) as a Resource Relationship extension for occurrences with interactions.

IPTrr_PreDev.R

prepares EMu Catalogue data (pre-EMu-development) as a Resource Relationship extension for occurrences with interactions.
- see the Relationship Data workflows doc for help with EMu data handling and IPT mapping.
try an online version here

Setup

1. Install R, RStudio, and Dependencies

EMu-IPT-Prep scripts primarily use tidyverse's tidyr and readr packages. For more info, check the tidyverse site

Download and install R and RStudio
In RStudio, install the required tidyverse packages in the 'Console' pane (usually lower-left) by typing the following and hitting enter: install.packages('tidyverse')

2. Clone or Download this repo

To clone the repo, UChicago's steps here are helpful.

Or:

Simply download the EMu-IPT-prep repo as a .zip, and unzip it
Open RStudio, and create a new project by going to File --> New Project --> Existing Directory (select the 'EMu-IPT-prep' directory), and clicking 'Create Project'

3. Get Data from EMu

The input files for the EMu-IPT-prep scripts are CSV datasets generated from EMu reports. In this repo:

First, create a data01raw directory
Second, create a data02output directory
Run the script's corresponding EMu CSV report and put the output CSVs in the location described below:
- For Audubon Core scripts, e.g. IPTac.R:
  - EMu Catalogue report = 'IPT Audubon Core' CSV report
  - Location for all EMu csv's from this report: data01raw/
- For Darwin Core scripts, e.g. IPTdwc.R:
  - run an EMu Catalogue 'IPT_General' CSV report (or IPT_[Collection Area])
  - Note that the file should be the default EMu report name ecatalog.csv
  - Location for EMu csv: data01raw/iptSpec/
- For Resource Relationship scripts, e.g. IPTrr.R
  - run an EMu Catalogue 'IPT Resource Relationship' CSV report
  - Location for EMu csv: data01raw/relationships

4a. Run a Script from command line

Open command line (cmd, terminal, etc), and check that R can run there by typing Rscript and hitting enter.
- If a 'command not found' warning appears, add Rscript.exe's path (e.g. C:\Program Files\R\R-4.1.2\bin) to the Path environment variable
- Steps to add a path are here
cd to the root directory of this repo
Use Rscript to run a script in commandline -- e.g.: Rscript IPTac.R
- Use --verbose to see more info while the script runs -- e.g.: Rscript --verbose IPTac.R
When the script finishes, check for the output file/s in the data02output directory in this repo.

4b. Run a Script from RStudio

Scripts can be run using R's source() function if input-files are named properly and in the right directory. When running source, setting verbose=TRUE can be useful if warnings or errors pop up. After running a script, cross-checking the input- and output-data in a text-editor -- or in RStudio's 'Environment' pane (usually upper right) -- is recommended.

In RStudio, make sure you're in the EMu-IPT-prep project (The top of the RStudio window should show the project directory path. If it's wrong, go to File -> Open Project -> go to the EMu-IPT-prep dir, and open its '.RProj' file).
Run the source function in the Console pane by typing source("[script-filename]", verbose=TRUE) and hitting enter -- e.g.:

source("IPTac.R", verbose=TRUE) # For Audubon Core

source("IPTdwc.R", verbose=TRUE) # For Darwin Core
While the script is running, a small red 'stop sign' icon will display in the Console pane's upper-right corner. When the script is finished, the stop sign will disappear.
When the script finishes, check for the output file/s in the data02output directory in this repo.
Rename the output file Catalog2.csv to the corresponding collection e.g. field_ipt_insects
zip the file

A note on warning messages

One or more parsing issues, see problems() for details

try using guess max like this cat <- read_csv(file = "data01raw/iptSpec/ecatalog.csv", guess_max = 1000000)
- Basically "guess_max" tells R to look at more rows before guessing which data-types to assign to columns... we could get more strict about schemas, but for now should be good.

To do:

Add example input/output data
More how-to, validation, error logging...
Finish draft-CD script

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
IPTrr_app		IPTrr_app
sampleData/relationships		sampleData/relationships
.gitignore		.gitignore
IPTac.R		IPTac.R
IPTac_v1.R		IPTac_v1.R
IPTcd.R		IPTcd.R
IPTdwc.R		IPTdwc.R
IPTrr.R		IPTrr.R
IPTrr_PreDev.R		IPTrr_PreDev.R
LICENSE.md		LICENSE.md
README.md		README.md

License

fieldmuseum/EMu-IPT-Prep

Folders and files

Latest commit

History

Repository files navigation

EMu-IPT-Prep

IPTac.R

IPTcd.R

IPTdwc.R

IPTrr.R

IPTrr_PreDev.R

Setup

1. Install R, RStudio, and Dependencies

2. Clone or Download this repo

3. Get Data from EMu

4a. Run a Script from command line

4b. Run a Script from RStudio

A note on warning messages

To do:

About

Topics

Resources

License

Stars

Watchers

Forks

Languages