Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding functions and scripts for downloading, extracting, and processing observations, initial conditions, land cover types, ERA5 drivers for anchor sites within NA. #3278

Merged
merged 84 commits into from
May 21, 2024
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
eddb3ac
Add shapefiles of level 1 and 2 NA eco-region.
Mar 12, 2024
0370079
Add a script for processing anchor site data preparations.
Mar 12, 2024
0ac8aee
Automatic updated namespace.
Mar 12, 2024
76fa0f2
Update codes for grabbing site info from a multi-settings object.
Mar 12, 2024
f4102be
Add a function for extracting ISCN SOC initial conditions.
Mar 12, 2024
80de614
Add a function for downloading soil moisture from the CDS server.
Mar 12, 2024
2c7d181
Add a function for extracting downloaded soil moisture netcdf files.
Mar 12, 2024
d31f478
Add rdata file for the ISCN SOC records along with the corresponding …
Mar 12, 2024
0fb6a4d
Add a script for processing initial conditions for the anchor site.
Mar 12, 2024
2ecace0
Tweak ecoregion finder function to adjust the NA area.
Mar 12, 2024
fa445ce
Automatic updated namespace.
Mar 12, 2024
f8025bb
Update MODIS_LAI_prep function.
Mar 12, 2024
b2ebe9b
Add a function to download, extract and filter MODIS LC data.
Mar 12, 2024
c9eb19d
Add a function for extracting AGB initial conditions from GeoTIFF files.
Mar 12, 2024
dace7a2
Merge branch 'develop' of https://github.com/PecanProject/pecan into …
Mar 12, 2024
a75ab0b
Updated previous documentation.
Mar 17, 2024
d919d12
Update progress bar for the qsub_parallel function.
Apr 25, 2024
d89d66a
Correct the unit for using the soil moisture initial condition in SDA.
Apr 25, 2024
d27e7dd
Correct the indexing for block based analysis function.
Apr 25, 2024
a854ed9
Correct logic for the analysis run.
Apr 25, 2024
4cfc6be
Add arguments in the plotting function to control plot size.
Apr 25, 2024
a103d7b
Update create a multi-site script to include the NA anchor sites.
Apr 25, 2024
8d0bc46
Fix the bug where we write ERA5 paths into the settings.xml file.
Apr 25, 2024
9c1dbf4
Update the script to download North America ERA5 data.
Apr 25, 2024
ca4fa97
Provide another option for downloading SoilGrids data if the parallel…
Apr 25, 2024
68a1471
Add argument to decide if we want to skip the high sd observations fo…
Apr 25, 2024
43f73bf
Update function for reading geotiff file of AGB initial conditions.
Apr 25, 2024
6b19993
Update the filtering scheme upon the QC band.
Apr 25, 2024
5cdaa47
Remove shapefiles of NA ecoregion maps from the repo.
Apr 25, 2024
a6aa42d
Remove R script and replace it with the RMD file.
Apr 25, 2024
cde0988
Update the usage of ecoregion maps when searching for ecoregion code.
Apr 25, 2024
4b14550
Rename the current ERA5 USA download script to NA.
Apr 25, 2024
8391427
Replace the previous R script with a more informative Rmd file.
Apr 25, 2024
11845e3
Reverted dependency to dplyr
Apr 25, 2024
3c5e4b9
Update documentation.
Apr 25, 2024
d57b7bb
Update ecoregion paths argument.
Apr 25, 2024
586c694
Add introduction on how to build python CDS API.
Apr 25, 2024
d5af569
Update documentation
Apr 25, 2024
5b44da9
Update the documentation.
Apr 25, 2024
2ac0ad6
Merge branch 'PecanProject:develop' into develop
DongchenZ Apr 25, 2024
82b35d7
Merge branch 'develop' of https://github.com/DongchenZ/pecan into dev…
Apr 25, 2024
f0401c4
Bug fix
Apr 25, 2024
2eeffce
Create interface for user to create their own credentials.
Apr 26, 2024
1a0d3b7
Bug fix.
Apr 27, 2024
5645ceb
Update ISCN extract function based on new Rdata file.
Apr 27, 2024
64bb891
update data.
Apr 27, 2024
85cf197
Update documentation.
Apr 27, 2024
536d742
Update the way of automatically create the credential file.
Apr 27, 2024
ccfe190
Update documentation.
Apr 27, 2024
e7fb69b
document iscn_soc rdata.
Apr 27, 2024
308e996
Update dependencies.
Apr 27, 2024
053fe56
Bug fix.
Apr 27, 2024
af45b0d
Fix documentation.
Apr 27, 2024
59bf4b8
Fix typo in roxygen.
Apr 27, 2024
d11887c
Revert back to the magrittr dependency.
Apr 27, 2024
7fe5670
add namespace.
Apr 27, 2024
bb6d6df
Put the instruction in the roxygen structure.
May 16, 2024
6757a92
Updated documentation.
May 16, 2024
ac37744
Update logger info.
May 16, 2024
14df11a
Update the usage for filtering high sd data and documentation.
May 16, 2024
2446661
Update the usage for exporting the csv file.
May 16, 2024
d83f343
Update the qc.filter usage.
May 16, 2024
1a68cee
Rename the function to be more product specific.
May 16, 2024
8f2823c
Update documentation.
May 16, 2024
08a454e
Update documentation.
May 16, 2024
69a3faf
Move file to the correct folder.
May 16, 2024
8a14ec0
Update documentations.
May 16, 2024
9b09744
Update documentation.
May 16, 2024
c91660b
Merge branch 'PecanProject:develop' into develop
DongchenZ May 16, 2024
fda90e3
Merge branch 'develop' of https://github.com/DongchenZ/pecan into dev…
May 16, 2024
569da69
Update documentation.
May 16, 2024
bbb304f
Update change log file.
May 16, 2024
9e121ff
Bug fixes.
May 16, 2024
2eaddc8
Merge branch 'PecanProject:develop' into develop
DongchenZ May 21, 2024
e5bae23
Update book source.
May 21, 2024
87e75fe
Update IC prep script
May 21, 2024
10d6307
Update documentation.
May 21, 2024
9b911cd
Merge branch 'develop' of https://github.com/DongchenZ/pecan into dev…
May 21, 2024
aaf1a93
Update documentation
May 21, 2024
dda87a1
Fix ifelse.
May 21, 2024
5b82269
Remove the level 1 ecoregion map.
May 21, 2024
fce4f1b
Remove ecoregion map from assimsequential package.
May 21, 2024
0c6224d
Change the package name.
May 21, 2024
bffb27e
Update documentation.
May 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
74 changes: 74 additions & 0 deletions modules/assim.sequential/inst/anchor/anchorSites_data_prep.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
library(dplyr)
library(xts)
library(PEcAn.all)
library(purrr)
library(furrr)
library(lubridate)
library(nimble)
library(ncdf4)
library(PEcAnAssimSequential)
library(dplyr)
library(sp)
library(raster)
library(zoo)
library(ggplot2)
library(mnormt)
library(sjmisc)
library(stringr)
library(doParallel)
library(doSNOW)
library(Kendall)
library(lgarch)
library(parallel)
if (future::supportsMulticore()) {
future::plan(future::multicore)
} else {
future::plan(future::multisession)
}

setwd("/projectnb/dietzelab/dongchen/anchorSites/")
settings <- PEcAn.settings::read.settings("/projectnb/dietzelab/dongchen/anchorSites/SDA/pecan.xml")

in.path <- "/projectnb/dietzelab/hamzed/ERA5/Data/Ensemble"
out.path <- "/projectnb/dietzelab/dongchen/anchorSites//ERA5_2012_2021"
mdietze marked this conversation as resolved.
Show resolved Hide resolved

# prepare ERA5
paths <- ERA5_met_process(settings = settings, in.path = in.path, out.path = out.path, write.db = F, write = T)

#prepare observations
settings <- PEcAn.settings::read.settings("/projectnb/dietzelab/dongchen/anchorSites/SDA/pecan.xml")
settings$state.data.assimilation$Obs_Prep$outdir <- "/projectnb/dietzelab/dongchen/anchorSites/Obs"
obs <- PEcAnAssimSequential::SDA_OBS_Assembler(settings)
mdietze marked this conversation as resolved.
Show resolved Hide resolved

#prepare LC and PFTs
LC <- PEcAn.data.remote::MODIS_LC_prep(site_info = site_info, time_points = settings$state.data.assimilation$start.date, qc.filter = T)
mdietze marked this conversation as resolved.
Show resolved Hide resolved

#tweak LC to actual pfts
LC[which(LC[,2] %in% c("Deciduous Broadleaf Forests",
"Deciduous Needleleaf Forests",
"Mixed Forests"))] <- "temperate.deciduous.HPDA"
LC[which(LC[,2] %in% c("Evergreen Broadleaf Forests",
"Evergreen Needleleaf Forests"))] <- "boreal.coniferous"
LC[which(LC[,2] %in% c("Closed Shrublands",
"Open Shrublands",
"Woody Savannas",
"Savannas",
"Grasslands",
"Permanent Wetlands",
"Croplands",
"Cropland/Natural Vegetation Mosaics",
))] <- "semiarid.grassland_HPDA"
LC[which(LC[,2] %in% c("Urban and Built-up Lands",
"Permanent Snow and Ice",
"Barren",
"Water Bodies"))] <- NA
#delete settings and LC table with NAs.
del_ind <- which(is.na(LC[,2]))
LC <- LC[-del_ind,]
settings <- settings[-del_ind]
#write settings and PFT file.
PEcAn.settings::write.settings(settings, outputfile = "pecan.xml")
write.csv(LC, file = settings$run$settings.1$inputs$pft.site[[1]], row.names = F)
# settings <- PEcAn.settings::read.settings("/projectnb/dietzelab/dongchen/anchorSites/SDA/pecan.xml")
# settings <- PEcAn.settings::prepare.settings(settings)

3 changes: 3 additions & 0 deletions modules/data.land/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
export(BADM_IC_process)
export(Clean_Tucson)
export(EPA_ecoregion_finder)
export(IC_ISCN_SOC)
export(InventoryGrowthFusion)
export(InventoryGrowthFusionDiagnostics)
export(Read.IC.info.BADM)
Expand All @@ -11,11 +12,13 @@ export(Soilgrids_SoilC_prep)
export(buildJAGSdata_InventoryRings)
export(cohort2pool)
export(dataone_download)
export(download.SM_CDS)
export(download_package_rm)
export(ens_veg_module)
export(extract.stringCode)
export(extract_FIA)
export(extract_NEON_veg)
export(extract_SM_CDS)
export(extract_soil_gssurgo)
export(extract_soil_nc)
export(extract_veg)
Expand Down
79 changes: 48 additions & 31 deletions modules/data.land/R/IC_BADM_Utilities.R
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@ Read.IC.info.BADM <-function(lat, long){
Site = Gdf$SITE_ID %>% unique(),
Var = Gdf$VARIABLE[1],
Date = Date.in,
Organ = Organ.in,
# Organ = Organ.in,
AGB = PlantWoodIni,
soil_organic_carbon_content = SoilIni,
litter_carbon_content = litterIni
Expand Down Expand Up @@ -239,15 +239,19 @@ netcdf.writer.BADM <- function(lat, long, siteid, outdir, ens){
#' @export
#'
BADM_IC_process <- function(settings, dir, overwrite=TRUE){


new.site <-
data.frame(
id = settings$run$site$id %>% as.numeric(),
lat = settings$run$site$lat ,
lon = settings$run$site$lon %>% as.numeric()
)

# create site info.
new.site <-
settings %>%
purrr::map(~.x[['run']] ) %>%
purrr::map('site')%>%
purrr::map(function(site.list){
#conversion from string to number
site.list$lat <- as.numeric(site.list$lat)
site.list$lon <- as.numeric(site.list$lon)
list(id=site.list$id, lat=site.list$lat, lon=site.list$lon)
})%>%
dplyr::bind_rows() %>%
as.list()

out.ense <- seq_len(settings$ensemble$size) %>%
purrr::map(~ netcdf.writer.BADM(new.site$lat,
Expand All @@ -273,36 +277,49 @@ BADM_IC_process <- function(settings, dir, overwrite=TRUE){
#' @return a dataframe with codes corresponding to level1 and level2 codes as two columns
#' @export
#'
EPA_ecoregion_finder <- function(Lat, Lon){
EPA_ecoregion_finder <- function(Lat, Lon, filenames = c("NA_CEC_Eco_Level1.shp", "NA_CEC_Eco_Level2.shp")){
#try eco-region map for just CONUS US.
if (is.null(filenames)) {
filenames = c("eco-region.json", "eco-regionl2.json")
}
#lat long to spatial point
U.S.SB.sp <-
data.frame(Lati = Lat %>% as.numeric(),
Long = Lon %>% as.numeric())

sp::coordinates(U.S.SB.sp) <- ~ Long + Lati


# L1 layer
L1 <-
sf::read_sf(system.file("extdata","eco-region.json", package = "PEcAn.data.land")) %>%
sf::st_set_crs(
"+proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs"
) %>%
sf::st_transform("+proj=longlat +datum=WGS84")
# L2 layer
L2 <-
sf::read_sf(system.file("extdata","eco-regionl2.json", package = "PEcAn.data.land")) %>%
sf::st_set_crs(
"+proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs"
) %>%
sf::st_transform("+proj=longlat +datum=WGS84")

sp::proj4string(U.S.SB.sp) <- sp::proj4string(sf::as_Spatial(L1))
# finding the code for each site
over.out.L1 <- sp::over(U.S.SB.sp, sf::as_Spatial(L1))
over.out.L2 <- sp::over(U.S.SB.sp, sf::as_Spatial(L2))
if (length(filenames) >= 1) {
L1 <-
sf::read_sf(system.file("extdata",filenames[1], package = "PEcAn.data.land")) %>%
sf::st_set_crs(
"+proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs"
) %>%
sf::st_transform("+proj=longlat +datum=WGS84")
#reproject points based on projection of the eco-region map.
sp::proj4string(U.S.SB.sp) <- sp::proj4string(sf::as_Spatial(L1))
# finding the code for each site
over.out.L1 <- sp::over(U.S.SB.sp, sf::as_Spatial(L1))
} else {
PEcAn.logger::logger.info("Please provide file name to the Level 1 eco-region map!")
return(0)
}

return(data.frame(L1 = over.out.L1$NA_L1CODE, L2 = over.out.L2$NA_L2CODE))
# L2 layer
if (length(filenames) > 1) {
L2 <-
sf::read_sf(system.file("extdata",filenames[2], package = "PEcAn.data.land")) %>%
sf::st_set_crs(
"+proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs"
) %>%
sf::st_transform("+proj=longlat +datum=WGS84")
# finding the code for each site
over.out.L2 <- sp::over(U.S.SB.sp, sf::as_Spatial(L2))
return(data.frame(L1 = over.out.L1$NA_L1CODE, L2 = over.out.L2$NA_L2CODE))
} else {
return(data.frame(L1 = over.out.L1$NA_L1CODE))
}
}


31 changes: 31 additions & 0 deletions modules/data.land/R/ISCN_extract.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#' Extract ISCN SOC initial conditions from existing ISCN database.
#'
#' @param site_info Bety list of site info including site_id, lon, and lat.
#' @param ens ensemble number.
#'
#' @return A data frame containing sampled SOC, each row represent each site.
#' @export
#'
#' @examples
#' @author Dongchen Zhang
#' @importFrom magrittr %>%
mdietze marked this conversation as resolved.
Show resolved Hide resolved
IC_ISCN_SOC <- function(site_info, ens = 100) {
iscn_soc <- PEcAn.data.land::iscn_soc
site_eco <- PEcAn.data.land::EPA_ecoregion_finder(site_info$lat, site_info$lon)
soc <- iscn_soc %>%
dplyr::filter(
.data$NA_L2CODE %in% site_eco$L2
) %>%
dplyr::select("soc (g cm-2)", "NA_L2CODE") %>%
na.omit()
ic_sample_soc <- data.frame(matrix(NA, ens, length(site_info$site_id))) %>%
`colnames<-`(site_info$site_id)
for (i in seq_along(site_eco$L2)) {
if (is.na(site_eco$L2[i])) {
next
} else {
ic_sample_soc[,i] <- sample(soc$`soc (g cm-2)`[which(soc$NA_L2CODE == site_eco$L2[i])], ens, replace = T)
}
}
return(ic_sample_soc)
}
91 changes: 91 additions & 0 deletions modules/data.land/R/download.SM_CDS.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
#' Download CDS soil moisture data for the SDA workflow.
#'
#' @param outfolder physical paths to where the unziped soil moisture files are downloaded.
#' @param time_points A vector contains each time point within the start and end date.
#' @param overwrite flag determine if we want to overwrite existing files when downloading.
#'
#' @return A vector containing file paths to the downloaded files.
#' @export
#'
#' @examples
#' @author Dongchen Zhang
#' @importFrom magrittr %>%
download.SM_CDS <- function(outfolder, time.points, overwrite = FALSE) {
#load cdsapi from python environment.
tryCatch({
cdsapi <- reticulate::import("cdsapi")
}, error = function(e) {
PEcAn.logger::logger.severe(
"Failed to load `cdsapi` Python library. ",
"Please make sure it is installed to a location accessible to `reticulate`.",
"You should be able to install it with the following command: ",
"`pip install --user cdsapi`.",
"The following error was thrown by `reticulate::import(\"cdsapi\")`: ",
mdietze marked this conversation as resolved.
Show resolved Hide resolved
conditionMessage(e)
)
})
#check if the token exists for the cdsapi.
if (!file.exists(file.path(Sys.getenv("HOME"), ".cdsapirc")))
mdietze marked this conversation as resolved.
Show resolved Hide resolved
PEcAn.logger::logger.severe(
"Please create a `${HOME}/.cdsapirc` file as described here:",
"https://cds.climate.copernicus.eu/api-how-to#install-the-cds-api-key."
)
#grab the client object.
tryCatch({
cclient <- cdsapi$Client()
}, error = function(e) {
PEcAn.logger::logger.severe(
"The following error was thrown by `cdsapi$Client()`: ",
conditionMessage(e)
)
})
#loop over each time point.
file.names <- c()
#setup progress bar.
pb <- utils::txtProgressBar(min = 0, max = length(time.points), style = 3)
for (i in seq_along(time.points)) {
#name file.
fname <- file.path(outfolder, paste('surface_soil_moisture', time.points[i], "nc", sep = "."))
fname.zip <- gsub(".nc", ".zip", fname, fixed = T)
#add new extracted file into vector.
file.names <- c(file.names, fname)
#if we have already downloaded this file.
if (file.exists(fname) && !overwrite) {
PEcAn.logger::logger.warn(glue::glue(
"File `{fname}` already exists, and `overwrite` is FALSE. ",
"Skipping to next variable."
))
next
}
#prepare file through cds server.
while ("try-error" %in% class(try(do_next <- cclient$retrieve(
'satellite-soil-moisture',
list(
'variable'= 'surface_soil_moisture',
'type_of_sensor'= 'active',
'time_aggregation'= 'day_average',
'year'= sprintf("%04d", lubridate::year(time.points[i])),
'month'= sprintf("%02d", lubridate::month(time.points[i])),
'day'= sprintf("%02d", lubridate::day(time.points[i])),
'type_of_record'= 'cdr',
'version'= 'v202212'
),
'download.zip'
)))) {
Sys.sleep(10)
PEcAn.logger::logger.info("Encounter error! Will try download in 10 seconds.")
}
#download file to local.
utils::download.file(do_next$reply$location, destfile = fname.zip)
#unzip file.
unzipPath <- utils::unzip(zipfile = fname.zip, exdir = outfolder)
#rename unziped file.
base::file.rename(unzipPath, fname)
#remove zip file.
base::file.remove(fname.zip)
#update progress bar
pbi <- i
utils::setTxtProgressBar(pb, pbi)
}
file.names
}