Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Landtrender.AGB.Model [Work In Progress] #3282

Closed
wants to merge 2 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
203 changes: 203 additions & 0 deletions Creation of Landtrendr.AGB.Model [Work In Progress]
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
---
title: "GEDI4R Code"
author: "Tami Gordon"
date: "2024-03-18"
output: html_document
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File should be named GEDI_AGB_prep.R (and function should be GEDI_AGB_prep) and should be in the modules/data.remote/R/ folder. Also, since it's a R file you need to remove the Rmarkdown formatting, such as this header and all the ```{r} code chunk seperators

#' Prepare MODIS AGB data for the SDA workflow.
#'
#' @param outdir Where the final CSV file will be stored.
#' @param country String value to determine depth of bounding box
#' @param start_date String value to determine start date of data sequestration
#' @param end_date String value to determine end date of data sequestration
#' @param site_info Bety list of file names with the h5 extension
#' @param focus_sites Bety list of file names with the zip extension within GEDI4R package
#' @param export_csv Decide if we want to export the CSV file.
#'Suggests: rmarkdown, knitr, testthat (>= 3.0.0)
#' @export
#'
#' @examples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you don't have an example, you don't need an @examples section

#' @author Tami Gordon [02/08/24]
#' @importFrom magrittr %>%
#'
---

```{r setup, include= TRUE}
knitr::opts_chunk$set(echo = TRUE)
## Getting Started
install.packages("devtools")
devtools::install_github("VangiElia/GEDI4R")
#loading GEDI4R package
library(GEDI4R)
library(magrittr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for during development, but library calls don't go inside R functions when they're within R packages. Instead they get added to that package's DESCRIPTION file (in this case modules/data.remote/DESCRIPTION)

```

```{r}
## Setting Parameters
country = 'ITA'
start_date = "2020-01-01"
end_date = "2020-01-31"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things that are examples of inputs to the function should go in the @examples


## Specific Sites you want to focus on
focus_sites = c("GEDI04_A_2020036151358_O06515_02_T00198_02_002_01_V002.zip",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. not clear how someone would figure out what zip files they'd need for a different set of sites
  2. for this to work in our workflow, figuring out filenames is something that would need to be automated, not something the user would have to input

"GEDI04_A_2021150031254_O13948_03_T06447_02_002_01_V002.zip" )

export_csv = TRUE
outdir = tempdir
## saved as HDF5 file format
#if we export CSV but didn't provide any path
if(as.logical(export_csv) && is.null(outdir)){
PEcAn.logger::logger.info("If you want to export CSV file, please ensure input the outdir!")
return(0)
}
outdir = tempdir()
```

```{r}

```


```{r}

# Using data from GEDI4R package dataset (NASA website)
## Create bounding box (search window) using varname 'country'
country_bound <- sf::st_as_sf(raster::getData('GADM', country = country, level = 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example you came across for learning how to get GEDI data may have done this, but for our application, we don't want to search by country, we want to search based on the site_info (which is mostly a list of lat/lon coordinates)

## raster::getData; get geographic data for anywhere in the world; GADM (database) needs "country ="and level of subdivision
#extract extent
e <- raster::extent(country_bound)
## extent creates a matrix
ul_lat <- e@ymax
lr_lat <- e@ymin
ul_lon <- e@xmin
lr_lon <- e@xmax
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how big GEDI tiles are, but it may make more sense, for our application, to create bounding boxes around each site in the list, rather than one giant bounding box around all of them (which would grab all the data for all of North America). But my suggestion depends on how long it takes to query the list of files, how big they are, and how many we end up downloading and never touching.


#Get the list of all files available for the study area in the period selected,
#using just_path = T
file_path <- l4_download(
ul_lat,
lr_lat,
ul_lon,
lr_lon,
outdir = outdir,
from = start_date,
to = end_date,
just_path = T
)

#download the first 4 files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the first 4?

file_download <- l4_download(
ul_lat,
lr_lat,
ul_lon,
lr_lon,
ncore = parallel::detectCores()-1,
outdir = outdir,
from = start_date,
to = end_date,
just_path = F,
subset=1:4
)

l4_zip <- system.file("extdata",
focus_sites,
package="GEDI4R")

#Unzipping GEDI level4A data
l4 <- lapply(l4_zip,unzip,exdir = outdir)

## Add columns not previously shown in gediL4 path data set
col <-
c(
"lat_lowestmode",
"lon_lowestmode",
"delta_time",
"agbd_pi_lower",
"agbd_pi_upper",
"agbd"
)

gediL4_output <- l4_getmulti(l4,add_col = col, source= T)
```
```{r}
# If using predownloaded dataset (saved as HDF5 file format) - (!) Could be automated to ask or check computer for file paths
site_info = list.files(path =".", pattern = "h5", all.files = T, full.names = T)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. earlier you defined site_info as a function input argument, so you don't want to redefine it here
  2. in all of the other data constraint functions, site_info provides the site ids, lat, lon, etc. Your function won't work with the larger PEcAn system if you're using the same variable to mean something very different here (vector of hdf5 files)
  3. looking for files in the current directory also isn't a general solution. This seems like a folder that should be passed into the function as an argument (e.g. is this outdir? or a subdir off of outdir?)

View(site_info)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't have a View within a function

if(length(site_info) != 0){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If length is 0, will the later code still work? Or do you need an else that either defines what all the variables below should be or to end the function with a return?

dataname <- l4_getmulti(site_info[[1]],just_colnames = T)
gediL4_path <- l4_getmulti(site_info, catch = T, merge=T)
## Dataset with predownloaded data
gediL4_pre <- l4_getmulti(site_info,just_colnames = F, add_col = col, source=T, catch = T)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear why the same function needs to be called 3 different times with different arguements. I don't know these functions well enough to know if this is right or wrong, but it definitely seems like a place where more documentation is needed to explain the intent of this block of code

rbind(gediL4_output, add_col = gediL4_pre, use.names = FALSE, fill = T)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output of this rbind isn't being assigned to anything

}



```

```{r}
## Find and download GEDI Data within your study area:
## site_info (list as data.table) tells the model the data files' site_id and its site_info
#list all dataset in h5 file

## Clipping Dataset
l4_data <- l4_getmulti(l4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to earlier comment, this seems to be the 4th call to l4_getmulti without clear explanation of how they're different

#clip using vector of coordinates
b_box <- c(-50,35,52,37)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like something that's leftover from the tutorial you were building from, not something that should be part of the function. More generally, does our function need to clip to a bounding box in order to extract data for a site, or do we end up doing that in another way? If the latter this can be skipped. If the former, then we probably need a loop where we clip each site +/- whatever we decide the search radius is for an individual site.

clipped <- l4_clip(l4_data,clip=b_box)

#using Shapefile to clip
bound <- system.file("extdata","bound4326.shp",package="GEDI4R")
#with extension
clipped <- l4_clip(l4_data,clip=bound,usegeometry = F)
#with polygon boundaries
clipped2 <- l4_clip(l4_data,clip=bound,usegeometry = T)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely don't need this bit on using a shapefile to clip -- seems leftover from the tutorial


## l4_convert, which can also reproject the data to a user-defined coordinate reference system
converted <- l4_convert(l4_data,epsg = 4326, filename=paste0(outdir,"/example.shp"),return_path = T)
example <- sf::read_sf(converted)
file.remove(list.files(outdir,pattern = "example",full.names = T))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to reproject the GEDI data? If so, we should be doing that before extracting site data to ensure things are aligned correctly.


## Subset of variables needed for gediL4 output
var_needed <-
c("lat_lowestmode",
"lon_lowestmode",
"delta_time",
"agbd_pi_lower",
"agbd_pi_upper",
"agbd",
"date",
"source")
gediL4 <- gediL4_output[,..var_needed]


## Save file as csv
write.csv(gediL4, "gediL4_output.csv")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This data doesn't seem to be organized in the way that our downstream data assimilation code is looking for.

```

```{r}
## Plotting Data

#footprints locations and AGBD distribution against elevation
l4_plotagb(gedil4,type="distribution")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

download function shouldn't be making plots


```

```{r}
## Check for old CSV and create new CSV
outdir = "/Users/tamigordon/Desktop/rGEDI/data files"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outdir shouldn't be hard coded, it should be a function input


if(file.exists(file.path(outdir, "AGB.csv"))){
time_points = rbind(start_date, end_date)
PEcAn.logger::logger.info("Extracting previous AGB file!")
Previous_CSV <- utils::read.csv(file.path(outdir, "AGB.csv"))
AGB_Output <- matrix(NA, length(site_info$site_id), 2*length(time_points)+1) %>%
cnames<-(c("site_id", paste0(time_points, "_AGB"), paste0(time_points, "_SD"))) %>% cnames <- as.data.frame(cnames)#we need: site_id, LAI, std, target time point.
AGB_Output$site_id <- site_info$site_id

}else{#we don't have any previous downloaded CSV file.
AGB_Output <- matrix(NA, length(site_info$site_id), 2*length(time_points)+1) %>%
`colnames<-`(c("site_id", paste0(time_points, "_AGB"), paste0(time_points, "_SD"))) %>% as.data.frame()#we need: site_id, LAI, std, target time point.
AGB_Output$site_id <- site_info$site_id
}
```