Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synthesis Version Updates #103

Closed
jaxinewolfe opened this issue Nov 28, 2023 · 7 comments
Closed

Synthesis Version Updates #103

jaxinewolfe opened this issue Nov 28, 2023 · 7 comments
Assignees

Comments

@jaxinewolfe
Copy link
Collaborator

jaxinewolfe commented Nov 28, 2023

This update is scheduled for March 2024

Please continue working in the develop branch!

Overall Goals:

  • Resolve errors in existing hook scripts
  • Add any necessary QAQC tests (depth intervals, data viz, taxa resolution)
  • Add necessary uncontrolled attributes and variables
  • Avoid duplication of cores (esp. international cores from seagrass or mangrove habitat)

Additional goals:

  • Remove instances of modeled fraction carbon values
  • Make sure that every core has an assigned habitat
  • Every study ID has at least one associated citation
  • Track down coordinates for every core (site-level resolution is better than nothing)
  • Activities data (radioisotope) units are standardized across all studies
@jaxinewolfe
Copy link
Collaborator Author

jaxinewolfe commented Nov 28, 2023

Resolving errors in existing hook scripts:

  • Weis 2000: standardize the "counts_per_hour" activity unit
  • Morgan et al 2024 depthseries missing study_id
  • Make sure data is output to derivative folders with no row index column
  • Poppe et al 2019 has some cores with duplicate impact classes
  • Holmquist synth impact classes need site_id (only core_id presently)
  • Resolve encoding errors in datasets that have accents (Cifuentes, Eagle, author names, etc)
  • Determine whether the following studies just have cores with duplicate intervals or if they contain cores that need to be separated into unique IDs: "Callaway_et_al_2019", "Fourqurean_and_Kendrick_unpublished," "Howard_and_Fourqurean_2020", "StLaurent_et_al_2020"
  • Edits to hook scripts with radiocarbon dates/ages? These would be: "Watson_and_Byrne_2013" "Shaw_et_al_2020" "Rodriguez_et_al_2022" "McTigue_et_al_2020" "Marot_et_al_2020" "Lafratta_et_al_2018"
    "Krauss_et_al_2018" "Johnson_et_al_2007" "Gerlach_et_al_2017" "Drexler_et_al_2009" "Costa_et_al_2023" "Belshe_et_al_2019" )

The following cores have a depth interval where the min and max are likely reversed:

study_id core_id
Thom_1992 PB1
DelVecchia_et_al_2014 M1314
Costa_et_al_2023 PanamaCaribbean-Station Forest (SF)-1
MacKenzie_et_al_2021 Catanauan_216_714
Sharma_et_al_2021 Koh_Kohng_138_384
Sharma_et_al_2021 Koh_Kohng_138_386

The following studies have intervals with NA depths:

"Nahlik_and_Fennessy_2016" "Langston_et_al_2022" "De_Iongh_et_al_1995" "Agawin_et_al_1996" "Townsend_and_Fonseca_1998" "Holmer_et_al_2007" "Van_Engeland_2010" "Boyd_et_al_2017"

Studies with NA Coords:

  • 1 Thom_1992
  • 2 Schile-Beers_and_Megonigal_2017
  • 3 Nsombo_et_al_2016
  • 4 Eid_and_Shaltout_2016_Egypt
  • 5 Eid_et_al_2016_Saudi_Arabia
  • 6 Poppe_et_al_2019
  • 7 Marot_et_al_2020
  • 8 Drexler_et_al_2013
  • 9 Belshe_et_al_2019

Studies with NA Habitat:

  • 1 Thom_1992
  • 2 Schile-Beers_and_Megonigal_2017 (likely sabkah, which we should add to the database)
  • 3 Osland_et_al_2016
  • 4 Marot_et_al_2020

Studies with Modeled C:

  • "Ceron-Breton_et_al_2011" (likely)

Synthesis and Post-Processing

  • How to indicate island municipalities or colonies? (ex. you can't filter for Bonaire)
  • should we truncate decimals?
  • flag cases where values that should be positive are negative? (ex. DBD for Lagomasino and Morrissette)

@jaxinewolfe jaxinewolfe pinned this issue Nov 28, 2023
@jaxinewolfe jaxinewolfe changed the title Version 1.2.0 Updates - March 2024 Version 1.1.1 Updates - March 2024 Dec 4, 2023
@jaxinewolfe jaxinewolfe changed the title Version 1.1.1 Updates - March 2024 Version 1.1.1 Updates Dec 4, 2023
@jaxinewolfe jaxinewolfe changed the title Version 1.1.1 Updates Synthesis Version Updates Dec 15, 2023
@jaxinewolfe
Copy link
Collaborator Author

jaxinewolfe commented Dec 15, 2023

We need a QAQC function to catch studies that have no associated citation

Some starter code:

no_citations <- ccrcn_synthesis$cores %>%
filter(!(study_id %in% unique(ccrcn_synthesis$study_citations$study_id)))

if (nrow(no_citations) > 0) {
warning("NOTE: The above studies were removed because they did not have citation information present.
Please review the CCN library synthesis to confirm that all synthesis studies have proper study citation information in '/data/CCN_synthesis/CCN_study_citations.csv' ")

unique(no_citations$study_id)
}

@jaxinewolfe
Copy link
Collaborator Author

jaxinewolfe commented Feb 26, 2024

@cheneyr @BettsH

Here are the QAQC results for our current version of the synthesis. It looks like a lot, but I think it's fairly small stuff that we can knock out! For example, theres a bunch of columns starting "..." which resulted from tables being output using write.csv() without specifying that row.names = F (idk why the default is set to true, its annoying) to prevent it from creating a column with the row number index included. If you spot stuff that is related to datasets you've worked on you can go for those quick fixes, or we can chat about some of the more nuanced things in our meeting (or whenever).

Also, we are well past 10k cores, holy moly! 🎉

index test result
1 Core ID uniqueness Check the following core_id(s) in the core-level data: 2B, 305, 398, 399, AL, B1, B2, B3, B4, G15, G4, G5, G9
2 Valid core ID links in core table No core ID in depthseries table: WBWA1109_01PU, WBWA1109_02PU, WBWA1109_03PU, WBWA1109_04PU, NSOR1209_01PU, NSOR1209_02PU, NSOR1209_03PU, NSOR1209_04PU, NBOR1409_01PU, NBOR1409_02PU, NBOR1409_03PU, Catlett_1m, Catlett_Transect, Goodwin_1m, Goodwin_Transect, Pamunkey_Transect, SweetHall_1m, SweetHall_Transect, Taskinas_Transect
3 Valid core ID links in depthseries table No core ID in core table: PB1, RC_U_A, RC_M_A, PR_U_A, PR_M_A, W_U_A, W_M_A, F_U_A, F_M_A
4 Test coordinate uniqueness 1373 sets of coordinates are associated with more than one core. Check 'data/QA/duplicate_coordinates.csv'
5 Validity of column names in depthseries table Undefined columns: ...33, ...38, th234_activity, th234_activity_se, k40_activity, k40_activity_se, ...56, ...57, date, pb210_crs_age, pb210_crs_age_sd
6 Validity of column names in cores table Undefined columns: ...29, salinity, ...31, ecological_condition_flag, ...37, ...38, core_date, core_position_method, geomorphic_id, ...42
7 Validity of column names in sites table Passed
8 Validity of column names in species table Undefined columns: ...7, ...8
9 Validity of column names in impacts table Undefined columns: impact_notes, ...6
10 Validity of column names in methods table Undefined columns: ...30, ...32, ground_or_sieved_flag, ...35, pb210_background_assumption, ...37
11 Validity of column names in study_citations table Undefined columns: keywords, day, ...20, issue, ...22, issn, abstract, eprint, ...30, ...31, article-number
12 Validity of variable names in depthseries table Passed
13 Validity of variable names in cores table Undefined variables: WGS84, riverine, palustrine, deltaic, brackish to fresh, brackish to saline, other, mudflat, plain, submerged subtidal
14 Validity of variable names in sites table Undefined variables: palustrine
15 Validity of variable names in species table Passed
16 Validity of variable names in impacts table Undefined variables: managed, restoring, canalled
17 Validity of variable names in methods table Undefined variables: PVC tube or thin-walled metal tube, Eijkelkamp peat core sampler, shovel, shovel core, gouge corer, polycarbonate tube, duplicate measurements, duplicate measurements, ground and sieved, not specified, not specified, not specified, selected intervals
18 Validity of variable names in study_citations table Undefined variables: primary source, article

@jaxinewolfe
Copy link
Collaborator Author

jaxinewolfe commented Feb 28, 2024

Synthesis QAQC Checks:

Note: the following have now been added to the synthesis report output

Depthseries

  • Cases where there are multiple observations per depth increment
  • Cases where there is only one interval observation for a core (consider adding a flag for these surface samples?)
  • Make sure that depth_max - depth_min is a positive number

Cores

  • Check that coordinates exist for each core
  • Each core is assigned a country and habitat type

Bib

  • Check that there is a bib citation for every study

@jaxinewolfe
Copy link
Collaborator Author

@cheneyr

Thanks for renaming the citation tables! Some are still getting flagged in the synthesis QA and it looks like it's because the study_id was left out. (Though the citation table for Drake 2024 may still be missing from the derivative folder). So one more annoying edit there for the following:

[1] "Stahl_et_al_2024" "Palinkas_and_Engelhardt_2024"
[3] "Palinkas_and_Cornwell_2024" "Drake_et_al_2024"
[5] "Craft_2024"

@jaxinewolfe
Copy link
Collaborator Author

jaxinewolfe commented Mar 6, 2024

@BettsH Could you take a look at the coordinates for Bukoski et al 2017 in the Sanderman synthesis? They were made fuzzy per the authors request, but there are a few that have ended up in Laos when they should be in Vietnam (so, a bit too fuzzy). Maybe check out the original paper or the supplementary data and/or use google earth engine to see if we can't update those so they a least get assigned the right country.

Cores in question: "M1566" "M1567" "M1568" "M1569" "M1570" "M1571" "M1572" "M1573" "M1574" "M1575" "M1576" "M1577" "M1578"

@jaxinewolfe
Copy link
Collaborator Author

@cheneyr two tasks for you (if you haven't already done them)!

  • Could you please remove the modeled fraction carbon values from Snedden_2021 (if you know what equation they used, you can make a note of this in the carbon_profile_notes of the methods table)
  • Can you resolve the mismatch in the core_id's for the Drake 2024 hook? (ex. PR_M_A is in the depthseries by not the core table)

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants