NOAA Obspack is a collection of green house gases observations
rtorf
only depends on data.table
and ncdf4
, which is basically
parallel C, so it can be installed in any machine.
remotes::install_github("noaa-gml/rtorf")
library(rtorf)
library(data.table)
The first step consists in constructing a summary for the ObsPack. This
is required to read the data, but also, identify agl
, which is present
in some of the file names. This function returns a data.frame
.
Optionally, the user can indicate a path to store the data.frame
.
obs_summary
also prints a summary of the data. The second argument is
the categories, and by default includes the categories shown below, to
account for all the files. Then the summary data.frame
contains the
columns id
as the full path to each file, name
which is the name or
relative path of the file, n
just an id, sector
such as tower, and
the column agl
which indicates the agl
indicated in the name of the
file if available. To read the documentation of this function, the user
must run ?obs_summary
.
We first define the categories
cate = c("aircraft-pfp",
"aircraft-insitu",
"aircraft-flask",
"surface-insitu",
"surface-flask",
"surface-pfp",
"tower-insitu",
"aircore",
"shipboard-insitu",
"shipboard-flask")
obs <- "Z:/torf/obspack_ch4_1_GLOBALVIEWplus_v5.1_2023-03-08/data/nc/"
index <- obs_summary(obs = obs, categories = cate)
## Number of files of index: 429
## sector N
## 1: aircraft-pfp 40
## 2: aircraft-insitu 15
## 3: surface-flask 106
## 4: surface-insitu 174
## 5: aircraft-flask 4
## 6: aircore 1
## 7: surface-pfp 33
## 8: tower-insitu 51
## 9: shipboard-flask 4
## 10: shipboard-insitu 1
## 11: Total sectors 429
## Detected 190 files with agl
## Detected 239 files without agl
There are 429 files in the ObsPack directory. The printed information
also shows the total at the bottom, as the sum of the individual file by
sector. This is to ensure that the sum of files is equal to the total
number of files found, shown at the top. furthermore, the printed
information also shows that there are 109 files with the agl
explicitly mentioned in the name of the file.
Once the summary is built, the function obs_read
will read the files
available in the index file previously generated. Here we selected the
category “tower-insitu”. The argument verbose prints which files are
being read each time, by default. At the end, this function prints the
total number of observations by type of altitude (agl or asl).
df <- obs_read_nc(index = index,
categories = "tower-insitu")
intake_height or altitude_final
The identification of the altitude and type is critical. The approach used here consists of:
- Use intake_height
- Identify
agl
from the name of the tile. - If
agl
is not present,agl = altitude - site_elevation
. - If there are some NA in elevation, will result some NA in
agl
- A new column is added named
altitude_final
to storeagl
orasl
- Another column named
type_altitude
is added to identifyagl
as 0, orasl
as 1.
To maintain a coherent approach among text and NetCDF files, we will add
a column altitude_final and type_altitude. This is done in obs_read_nc
or obs_read
.
Sometimes we need more information about the site. For instance, what do
the observations start and end. Then, we added the function obs_table
,
which calculates statistics summary of “time” and other numeric
variables by file name, sector, site, altitude and mode.
dft <- obs_table(df = df)
Now we can visualize the average of observations by site.
library(sf)
sdft <- st_as_sf(dft[stat == "mean"],
coords = c("longitude", "latitude"),
crs = 4326)
sdft$value <- sdft$value*1e09
plot(sdft["value"],
reset = FALSE,
axes = TRUE,
graticule = TRUE,
pch = 16,
cex = 1.5,
pal = cptcity::lucky(colorRampPalette = T), #random
main = NULL)
## Colour gradient: jjg_ccolo_phill_omgwtf_monday_sucks, number: 4023
maps::map(add = T)
Let us randomly select a couples of sites from the database.
usites <- unique(df$site_name)[sample(seq_along(unique(df$site_name)), 2)]
usites
## [1] "Karasevoe" "Yakutsk"
knitr::kable(dft[site_name %in% usites])
site_name | site_latitude | site_longitude | site_country | site_code | value | time | time_decimal | latitude | longitude | stat | timeUTC |
---|---|---|---|---|---|---|---|---|---|---|---|
Karasevoe | 58.2456 | 82.4244 | Russia | KRS | 1.848e-06 | 1095777000 | 2005 | 58.25 | 82.42 | min | 2004-09-21 14:30:00 |
Karasevoe | 58.2456 | 82.4244 | Russia | KRS | 1.937e-06 | 1208828700 | 2008 | 58.25 | 82.42 | q1 | 2008-04-22 01:45:00 |
Karasevoe | 58.2456 | 82.4244 | Russia | KRS | 1.974e-06 | 1315166400 | 2012 | 58.25 | 82.42 | median | 2011-09-04 20:00:00 |
Karasevoe | 58.2456 | 82.4244 | Russia | KRS | 1.991e-06 | 1329570666 | 2012 | 58.25 | 82.42 | mean | 2012-02-18 13:11:06 |
Karasevoe | 58.2456 | 82.4244 | Russia | KRS | 2.020e-06 | 1464107400 | 2016 | 58.25 | 82.42 | q3 | 2016-05-24 16:30:00 |
Karasevoe | 58.2456 | 82.4244 | Russia | KRS | 3.964e-06 | 1577716200 | 2020 | 58.25 | 82.42 | max | 2019-12-30 14:30:00 |
Yakutsk | 62.0886 | 129.3558 | Russia | YAK | 1.846e-06 | 1189791000 | 2008 | 62.09 | 129.36 | min | 2007-09-14 17:30:00 |
Yakutsk | 62.0886 | 129.3558 | Russia | YAK | 1.903e-06 | 1258813800 | 2010 | 62.09 | 129.36 | q1 | 2009-11-21 14:30:00 |
Yakutsk | 62.0886 | 129.3558 | Russia | YAK | 1.921e-06 | 1278496800 | 2011 | 62.09 | 129.36 | median | 2010-07-07 10:00:00 |
Yakutsk | 62.0886 | 129.3558 | Russia | YAK | 1.933e-06 | 1280083018 | 2011 | 62.09 | 129.36 | mean | 2010-07-25 18:36:58 |
Yakutsk | 62.0886 | 129.3558 | Russia | YAK | 1.950e-06 | 1301175000 | 2011 | 62.09 | 129.36 | q3 | 2011-03-26 21:30:00 |
Yakutsk | 62.0886 | 129.3558 | Russia | YAK | 2.672e-06 | 1386912600 | 2014 | 62.09 | 129.36 | max | 2013-12-13 05:30:00 |
We added a function to plot the data read from ObsPack. The y-axis is
the field value
and the x-axis is by default time
. The data
illustrated sorted by color is the field site_code
, with the default
number of 3 sites. The argument pal
is to define the color palette,
used by the internally imported function cptcity::cpt
.
obs_plot(dt = df[site_name %in% usites], time = "time", yfactor = 1e+09, cex = 0.5,
verbose = FALSE)
Here we can see 2.61 million observations for tower-insitu
. These
observations are made between 2004 and 2021.
ObsPack includes global observations and sometimes we need to extract data for a specific region and periods of time. In this part we include spatial and temporal parameters to filter data. The year of interest is 2020, but we also included December of 2019 and January of 2021. At this stage, we can apply the spatial filter by using the coordinates.
north <- 80
south <- 10
west <- -170
east <- -50
max_altitude <- 8000
evening <- 14
yy <- 2020
df <- rbind(df[year == yy - 1 & month == 12],
df[year == yy],
df[year == yy + 1 & month == 1])
df <- df[altitude_final < max_altitude &
latitude < north &
latitude > south &
longitude < east &
longitude > west]
unique(df[, c("altitude_final", "site_elevation", "elevation",
"dataset_selection_tag",
"site_name")])
## altitude_final site_elevation elevation dataset_selection_tag
## 1: 17.1 611.43 611.43 allvalid-17magl
## 2: 31.7 611.43 611.43 allvalid-32magl
## 3: 4.9 611.43 611.43 allvalid-5magl
## 4: 122.0 472.00 472.00 allvalid-122magl
## 5: 30.0 472.00 472.00 allvalid-30magl
## 6: 396.0 472.00 472.00 allvalid-396magl
## 7: 304.8 115.20 115.20 allvalid-305magl
## 8: 31.0 115.20 115.20 allvalid-31magl
## 9: 61.0 115.20 115.20 allvalid-61magl
## 10: 30.0 2.00 2.00 allvalid-30magl
## 11: 484.0 2.00 2.00 allvalid-483magl
## 12: 89.1 2.00 2.00 allvalid-91magl
## site_name
## 1: Carbon in Arctic Reservoirs Vulnerability Experiment (CARVE)
## 2: Carbon in Arctic Reservoirs Vulnerability Experiment (CARVE)
## 3: Carbon in Arctic Reservoirs Vulnerability Experiment (CARVE)
## 4: Park Falls, Wisconsin
## 5: Park Falls, Wisconsin
## 6: Park Falls, Wisconsin
## 7: Beech Island, South Carolina
## 8: Beech Island, South Carolina
## 9: Beech Island, South Carolina
## 10: Walnut Grove, California
## 11: Walnut Grove, California
## 12: Walnut Grove, California
After filtering by space and time, we have 1.12883^{5} million
observations. Towers can have observations at different heights. Here we
need to select one site with the observations registered at the highest
height. The column with the height is named altitude_final
and the max
altitude was named max_altitude
.
dfa <- df[,
max(altitude_final),
by = site_code] |> unique()
names(dfa)[2] <- "max_altitude"
dfa
## site_code max_altitude
## 1: CRV 31.7
## 2: LEF 396.0
## 3: SCT 304.8
## 4: WGC 484.0
Here we need to start time columns. The function obs_addtime
adds time
columns timeUTC
, timeUTC_start
which shows the start time of each
observation and timeUTC_end
which shows the end time for each
observation.
df2 <- obs_addtime(df)
## Adding timeUTC
## Adding timeUTC_start
## Adding timeUTC_end
## Found time_interval
Then we need a key_time to aggregate data. This can be done using UTC, solar, or local time. The normal approach is using afternoon solar or local time.
- Solar time
- Local time with columns
site_utc2lst
- Local time longitude
solar time (default)
Here we select the hours of interest and then aggregate data by year, month and day of solar time. In this way, we will have one information per day. however this approach is not appropriate for aircraft which are aggregated every 10 or 20 seconds. Hence we need to aggregate data by one time column. Also, this helps to generate the receptor info files including hour, minute and second. Hence, we need to add solar or local time column.
df2$solar_time <- obs_addstime(df2)
local time with column
site_utc2lst
Then we need to identify the local time with the function add_ltime
.
This is important because to identifying observations in the evening in
local time. add_ltime
uses two methods, first identify the time
difference with utc by identifying the metadata column “site_utc2lst”.
If solar time is not available #now we need to cut solar time for the
frequency needed. As we will work with
local time longitude
If this information is not available, with the aircrafts for instance, the local time is calculated with an approximation based on longitude:
$$
lt = UTC + longitude/15 * 60 * 60
$$ Where
Now we have they key column time, we can cut it accordingly.
df2$solar_time_cut <- cut(x = df2$solar_time,
breaks = "1 hour") |>
as.character()
How we can check the solar time and the cut solar time. Please note that solar_time_cut, the column that it will be used to aggregate data
How we filter for the required solar time, in this case 14.
df3 <- df2[hour_st %in% evening]
df3[, c("solar_time", "solar_time_cut")]
## solar_time solar_time_cut
## 1: 2019-12-02 14:49:12 2019-12-02 14:00:00
## 2: 2019-12-03 14:48:48 2019-12-03 14:00:00
## 3: 2019-12-04 14:48:24 2019-12-04 14:00:00
## 4: 2019-12-05 14:48:00 2019-12-05 14:00:00
## 5: 2019-12-06 14:47:35 2019-12-06 14:00:00
## ---
## 4312: 2021-01-27 14:11:38 2021-01-27 14:00:00
## 4313: 2021-01-28 14:11:26 2021-01-28 14:00:00
## 4314: 2021-01-29 14:11:14 2021-01-29 14:00:00
## 4315: 2021-01-30 14:11:03 2021-01-30 14:00:00
## 4316: 2021-01-31 14:10:52 2021-01-31 14:00:00
Now there are 4316 observations and before filtering 112883. At this
point we can calculate the averages of several columns by the cut time.
The function obs_agg
does this aggregation as shown in the following
lines of code. The argument gby
establish the function used to
aggregate cols
. I need to aggregate the data by date (year, month,
date), because it is already filtered by the hours of interest. Then, I
would have 1 observation per day.
As standard, let us define key_time
as solar_time
. The obs_agg
function will aggregate the desired data by that column.
df3$key_time <- df3$solar_time_cut
df4 <- obs_agg(dt = df3,
gby = "mean",
cols = c("value",
"latitude",
"longitude",
"site_utc2lst"),
verbose = T,
byalt = TRUE)
## Selecting by alt
## Adding time
Now there are 4316 observations, 108567 less observations. Here we add
the column max_altitude
to identify the max altitude by site.
df4[,
max_altitude := max(altitude_final),
by = site_code]
df4[,
c("site_code",
"altitude_final",
"max_altitude")] |> unique()
## site_code altitude_final max_altitude
## 1: CRV 17.1 31.7
## 2: CRV 31.7 31.7
## 3: CRV 4.9 31.7
## 4: LEF 122.0 396.0
## 5: LEF 396.0 396.0
## 6: LEF 30.0 396.0
## 7: SCT 304.8 304.8
## 8: SCT 31.0 304.8
## 9: SCT 61.0 304.8
## 10: WGC 30.0 484.0
## 11: WGC 484.0 484.0
## 12: WGC 89.1 484.0
Now that we have all the required information, we can save the files. Here, we name the data.frame as master, because it contains all the information. This is important because some fields can be used in the future, and for traceability. For convenience, time variables are transformed into character before writing into the disk. The separation is space ” “.
master <- df3
master$timeUTC <- as.character(master$timeUTC)
fwrite(master,
file = paste0(tempdir(), "/tower_insitu_2020.txt"),
sep = " ")
The format Comma Separated Value with YAML (CSVY)[^3] consists in a
typical CSV with a YAML header. The functionobs_write_csvy
includes
the argument notes
which allows adding custom notes at the header of
the file. Below the notes, obs_write_csvy
adds the output of the R
function str
, which provides a vertical summary of the data, known as
structure.
csvy <- paste0(tempdir(), "/tower_insitu_2020.csvy")
obs_write_csvy(dt = master,
notes = "tower 2020",
out = csvy
)
To check the YAML header we read the first 38 lines of the files that were generated. Here we can see the column names, type of data and first observations. The YAML header is delimited by the characters “—”.
readLines(csvy)[1:34] # and more
We need to filter some columns from the master files in a new object
called receptors. This is needed because internally we run HYSPLIT
[@hy] using the information from the receptors. In the case of a
tower, we need to select observations with the highest altitude. The
specific columns are selected as shown on the following code. We are
selecting the ending times, because later HYSPLIT is run backwards based
on the time of measurement, between ending and starting times. The
columns about time are formatted to have two characters. For instance,
the month 1, is formatted as “01”. We also need to filter for
type_altitude
equal 0, representing agl
observations , or equal to 1,
asl
.
receptor <- master[altitude_final == max_altitude,
c("site_code",
"year",
"month",
"day",
"hour",
"minute",
"second",
"latitude",
"longitude",
"altitude_final",
"type_altitude",
"year_end",
"month_end",
"day_end",
"hour_end",
"minute_end",
"second_end",
"time_decimal")]
receptor$altitude_final <- round(receptor$altitude_final)
receptor <- obs_format(receptor)
if(nrow(receptor_agl) > 0) {
fwrite(x = receptor_agl,
file = paste0(tempdir(), "/receptor_tower_insitu_2020_AGL.txt"),
sep = " ")}
if(nrow(receptor_asl) > 0) {
fwrite(x = receptor_asl,
file = paste0(tempdir(), "/receptor_tower_insitu_2020_ASL.txt"),
sep = " ")}
In this package we are sharing scripts to process other sectors The
scripts are available in the path
https://github.com/ibarraespinosa/rtorf/tree/main/rscripts
I’m currently implementing a version in python:
R | description | Python |
---|---|---|
fex | File extension | |
invfile | Methods for objects with class ‘invfile’ | |
obs_addltime | local hour (bsed on longitude and time) | |
obs_addstime | Add solar time into obspack | |
obs_addtime | Add times into obspack | |
obs_agg | Aggregates ObsPack by time | |
obs_find_receptors | Compares expected receptors | |
obs_footname | Expected footprint name | |
obs_format | Formatting data | |
obs_freq | return numeric vector in intervals | |
obs_index | Summary of the ObsPack files (.txt) | OK |
obs_invfiles | Generate files to perform inverse modeling | |
obs_list.dt | list.dt | |
obs_meta | Read obspack metadata | |
obs_out | outersect | |
obs_plot | Read obspack metadata | |
obs_rbind | rbind obspack | |
obs_read | Read obspack (.txt) | |
obs_read_csvy | reads CSVY | |
obs_read_nc | Read obspack (.nc) | OK |
obs_roundtime | round seconds from “POSIXct” “POSIXt” classes | |
obs_summary | Summary of the ObsPack files (.txt) | |
obs_table | Obspack Table | |
obs_trunc | Trunc numbers with a desired number of decimals | |
obs_write_csvy | Generates YAML and write data.frame | |
plot.invfile | Methods for objects with class ‘invfile’ | |
print.invfile | Methods for objects with class ‘invfile’ | |
sr | Extacts n last characters | |
summary.invfile | Methods for objects with class ’invfile |