`result()` throws an error with Notaro GCMs for more than one variable #399

lindsayplatt · 2021-11-15T22:06:37Z

When I try to use result() after having queried more than one variable, it throws a delimiter parsing error.

library(geoknife)
query_pts <- structure(list(
  `1` = c(-88.4467147238926, 42.7996962344843), 
  `2` = c(-88.6432114598417, 43.6283635061587), 
  `3` = c(-91.7772962678006, 45.6126497556779), 
  `4` = c(-90.290648760424, 44.2691685292945), 
  `5` = c(-91.5175899000678, 45.7450307015114)), 
  class = "data.frame", row.names = c("X", "Y"))

# More than one var fails:
gcm_job <- geoknife(
  stencil = simplegeom(query_pts),
  fabric = webdata(
    url = "https://cida.usgs.gov/thredds/dodsC/notaro_GFDL_1980_1999",
    variables = c("evspsbl", "hfss", "mrso"),
    times = c('1999-01-01', '1999-01-15')
  ),
  wait = TRUE
)

# I get a parsing error:
# Error in value[[3L]](cond) : Delimiter parse fail.
my_data <- result(gcm_job)

# Just one var works:
gcm_job <- geoknife(
  stencil = simplegeom(query_pts),
  fabric = webdata(
    url = "https://cida.usgs.gov/thredds/dodsC/notaro_GFDL_1980_1999",
    variables = c("evspsbl"),
    times = c('1999-01-01', '1999-01-15')
  ),
  wait = TRUE
)

my_data <- result(gcm_job)

The text was updated successfully, but these errors were encountered:

dblodgett-usgs · 2021-11-16T03:20:07Z

Huh -- I thought that this would have worked.

The mrso data have soil levels. It's not that you are asking for three variables, it's that one of them is breaking the parser.

Function is here: https://github.com/USGS-R/geoknife/blob/master/R/parseTimeseries.R#L19

This line is not handling that appropriately. https://github.com/USGS-R/geoknife/blob/master/R/parseTimeseries.R#L23

I don't have the brain space to work up a fix right now, but maybe you want to modify the parser to work the way you want? I'm not sure what the right way to handle this z dimension in the timeseries is.

The reprex below gets you to what you need and shows how to get as close as possible to the geoknife code that's failing. In essence, you have an additional column that is the z dimension soil_layer that is much the same as the time dimension stuck in there.

library(geoknife)
#> 
#> Attaching package: 'geoknife'
#> The following object is masked from 'package:stats':
#> 
#>     start
#> The following object is masked from 'package:graphics':
#> 
#>     title
#> The following object is masked from 'package:base':
#> 
#>     url
query_pts <- structure(list(
  `1` = c(-88.4467147238926, 42.7996962344843), 
  `2` = c(-88.6432114598417, 43.6283635061587), 
  `3` = c(-91.7772962678006, 45.6126497556779), 
  `4` = c(-90.290648760424, 44.2691685292945), 
  `5` = c(-91.5175899000678, 45.7450307015114)), 
  class = "data.frame", row.names = c("X", "Y"))

gcm_job <- geoknife(
  stencil = simplegeom(query_pts),
  fabric = webdata(
    url = "https://cida.usgs.gov/thredds/dodsC/notaro_GFDL_1980_1999",
    variables = c("mrso"),
    times = c('1999-01-01', '1999-01-15')
  ),
  wait = TRUE
)
#> Process Accepted

my_data <- result(gcm_job)
#> Error in value[[3L]](cond): Delimiter parse fail.

(my_job <- check(gcm_job))
#> $status
#> [1] "Process successful"
#> 
#> $URL
#> [1] "https://cida.usgs.gov:443/gdp/process/RetrieveResultServlet?id=12169137-96be-4d7e-8a6b-de88ee4f602cOUTPUT"
#> 
#> $statusType
#> [1] "ProcessSucceeded"
#> 
#> $percentComplete
#> [1] "100"

my_data <- readr::read_csv(my_job$URL, skip = 2)
#> New names:
#> * `MEAN(kg m-2)` -> `MEAN(kg m-2)...3`
#> * `MEAN(kg m-2)` -> `MEAN(kg m-2)...4`
#> * `MEAN(kg m-2)` -> `MEAN(kg m-2)...5`
#> * `MEAN(kg m-2)` -> `MEAN(kg m-2)...6`
#> * `MEAN(kg m-2)` -> `MEAN(kg m-2)...7`
#> Rows: 674 Columns: 7
#> -- Column specification --------------------------------------------------------
#> Delimiter: ","
#> dbl  (6): soil_layer(layer), MEAN(kg m-2)...3, MEAN(kg m-2)...4, MEAN(kg m-2...
#> dttm (1): TIMESTEP
#> 
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.

my_data
#> # A tibble: 674 x 7
#>    TIMESTEP            `soil_layer(layer)` `MEAN(kg m-2)...3` `MEAN(kg m-2)...4`
#>    <dttm>                            <dbl>              <dbl>              <dbl>
#>  1 1999-01-01 00:00:00                   0               47.9               47.6
#>  2 1999-01-01 00:00:00                   1              620.               620. 
#>  3 1999-01-01 01:00:00                   0               47.9               47.6
#>  4 1999-01-01 01:00:00                   1              620.               620. 
#>  5 1999-01-01 02:00:00                   0               47.9               47.6
#>  6 1999-01-01 02:00:00                   1              620.               620. 
#>  7 1999-01-01 03:00:00                   0               47.9               47.6
#>  8 1999-01-01 03:00:00                   1              620.               620. 
#>  9 1999-01-01 04:00:00                   0               47.9               47.6
#> 10 1999-01-01 04:00:00                   1              620.               620. 
#> # ... with 664 more rows, and 3 more variables: MEAN(kg m-2)...5 <dbl>,
#> #   MEAN(kg m-2)...6 <dbl>, MEAN(kg m-2)...7 <dbl>


# It's failing in here.
parseTimeseries(my_job$URL, delim = ",")
#> Error in value[[3L]](cond): Delimiter parse fail.

^{Created on 2021-11-15 by the reprex package (v2.0.0)}

lindsayplatt · 2021-11-17T14:27:29Z

I actually don't need mrso, so we can just skip that! No rush on a fix for us :)

lindsayplatt mentioned this issue Nov 16, 2021

Use centroids for GCM query DOI-USGS/lake-temperature-model-prep#225

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`result()` throws an error with Notaro GCMs for more than one variable #399

`result()` throws an error with Notaro GCMs for more than one variable #399

lindsayplatt commented Nov 15, 2021

dblodgett-usgs commented Nov 16, 2021

lindsayplatt commented Nov 17, 2021

result() throws an error with Notaro GCMs for more than one variable #399

result() throws an error with Notaro GCMs for more than one variable #399

Comments

lindsayplatt commented Nov 15, 2021

dblodgett-usgs commented Nov 16, 2021

lindsayplatt commented Nov 17, 2021

`result()` throws an error with Notaro GCMs for more than one variable #399

`result()` throws an error with Notaro GCMs for more than one variable #399