`read_vpts()` cannot read bioRad created CSV #654

CeciliaNilsson709 · 2024-02-19T13:42:42Z

Related to #653 and #635, there is still a descrepancy between the VPTS CSV format and how a VPTS dataframe looks like in bioRad. As a result, writing a VPTS dataframe to CSV and then reading it with read_vpts() doesn't work (the columns are not the same).

library(bioRad)
#> Welcome to bioRad version 0.7.3
#> using vol2birdR version 1.0.1 (MistNet not installed)
vpts_df <- as.data.frame(example_vpts)
write.csv(vpts_df, "vpts_df.csv")
read_vpts("vpts_df.csv")
#> New names:
#> • `` -> `...1`
#> Error: Field names in `schema` must match column names in data:
#> ℹ Field names: `radar`, `datetime`, `height`, `u`, `v`, `w`, `ff`, `dd`, `sd_vvp`, `gap`, `eta`, `dens`, `dbz`, `dbz_all`, `n`, `n_dbz`, `n_all`, `n_dbz_all`, `rcs`, `sd_vvp_threshold`, `vcp`, `radar_latitude`, `radar_longitude`, `radar_height`, `radar_wavelength`, `source_file`
#> ℹ Column names: `...1`, `radar`, `datetime`, `height`, `u`, `v`, `w`, `ff`, `dd`, `sd_vvp`, `gap`, `dbz`, `eta`, `dens`, `DBZH`, `n`, `n_dbz`, `n_all`, `n_dbz_all`, `rcs`, `sd_vvp_threshold`, `radar_latitude`, `radar_longitude`, `radar_height`, `radar_wavelength`, `day`, `sunrise`, `sunset`

vpts_df <- as.data.frame(example_vpts, geo =FALSE, suntime = FALSE)
write.csv(vpts_df, "vpts_df.csv")
read_vpts("vpts_df.csv")
#> New names:
#> • `` -> `...1`
#> Error: Field names in `schema` must match column names in data:
#> ℹ Field names: `radar`, `datetime`, `height`, `u`, `v`, `w`, `ff`, `dd`, `sd_vvp`, `gap`, `eta`, `dens`, `dbz`, `dbz_all`, `n`, `n_dbz`, `n_all`, `n_dbz_all`, `rcs`, `sd_vvp_threshold`, `vcp`, `radar_latitude`, `radar_longitude`, `radar_height`, `radar_wavelength`, `source_file`
#> ℹ Column names: `...1`, `radar`, `datetime`, `height`, `u`, `v`, `w`, `ff`, `dd`, `sd_vvp`, `gap`, `dbz`, `eta`, `dens`, `DBZH`, `n`, `n_dbz`, `n_all`, `n_dbz_all`, `rcs`, `sd_vvp_threshold`
Created on 2024-02-19 with reprex v2.1.0

The text was updated successfully, but these errors were encountered:

peterdesmet · 2024-02-19T14:53:42Z

@adokter @iskandari @bart1 @CeciliaNilsson709 I think we need a roadmap to tackle this, since the VPTS format is now half supported in bioRad:

Reading from hdf5 or csv results in different vpts objects (Combining hdf5 and csv VP files #653). This should be patched.
The columns in a bioRad vpts object are not consistent with the VPTS CSV format. I think we should have one format throughout. This is a major change.
Ideally the change to one format through is combined with making bioRad vpts objects a data.frame directly (VPTS objects as data.frame #568). This is a major change.
as.data.frame() currently has options (suntime and geo) resulting in different flavour of data frames. As described above, when using write.csv() you no longer end up with a VPTS CSV. This could be fixed with a write_vpts() function (removing those columns), no longer providing those options and/or having vpts objects be data.frames.

My 2 cents is that we should move towards this situation:

iskandari · 2024-02-20T04:50:48Z

as.data.frame() currently has options (suntime and geo) resulting in different flavour of data frames. As described above, when using write.csv() you no longer end up with a VPTS CSV. This could be fixed with a write_vpts() function (removing those columns), no longer providing those options and/or having vpts objects be data.frames.

Agreed that it would be simpler to have vpts objects as data frames, but then a side effect is that we lose metadata relevant to vp file creation. From the example in #653:

vpts_hdf5<- bind_into_vpts(read_vpfiles(c(hdf5_1, hdf5_2)))
vpts_hdf5$attributes$how$task_args
[1] "azimMax=360.000000,azimMin=0.000000,layerThickness=200.000000,nLayers=25,rangeMax=35000.000000,rangeMin=5000.000000,elevMax=90.000000,elevMin=0.000000,radarWavelength=5.300000,useClutterMap=0,clutterMap=,fitVrad=1,exportBirdProfileAsJSONVar=0,minNyquist=5.000000,maxNyquistDealias=25.000000,birdRadarCrossSection=11.000000,stdDevMinBird=2.000000,cellEtaMin=11500.000000,etaMax=36000.000000,dbzType=DBZH,requireVrad=0,dealiasVrad=1,dealiasRecycle=1,dualPol=0,singlePol=1,rhohvThresMin=0.950000,resample=0,resampleRscale=500.000000,resampleNbins=100,resampleNrays=360,mistNetNElevs=5,mistNetElevsOnly=1,useMistNet=0,mistNetPath=/MistNet/mistnet_nexrad.pt,areaCellMin=0.500000,cellClutterFractionMax=0.500000,chisqMin=0.000010,clutterValueMin=0.100000,dbzThresMin=0.000000,fringeDist=5000.000000,nBinsGap=8,nPointsIncludedMin=25,nNeighborsMin=5,nObsGapMin=5,nAzimNeighborhood=3,nRangNeighborhood=3,nCountMin=4,refracIndex=0.964000,cellStdDevMax=5.000000,absVDifMax=10.000000,vradMin=1.000000"

This irreversibility should be communicated clearly. Maybe vpts data frames should be able to retain other information, but then we implement a write_vpts() that is called, for example, by extending write.csv(format='vpts')

peterdesmet · 2024-02-20T10:57:42Z

You can add custom attributes to data.frames, which we could use for the vpts metadata. These attributes are not lost when using other functions, like dplyr's filter(). Note: I'm not very familiar with attributes, but it seems useful here.

library(dplyr)

# A dataframe has default attributes
df <- iris
attributes(df)
#> $names
#> [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"     
#> 
#> $class
#> [1] "data.frame"
#> 
#> $row.names
#>   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
#>  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
#>  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
#>  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
#>  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
#>  [91]  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107 108
#> [109] 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
#> [127] 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
#> [145] 145 146 147 148 149 150

# One can add custom attributes (here the list "metadata")
metadata <- list(radar = "bejab", regular = FALSE)
attr(df, "metadata") <- metadata
attributes(df)
#> $names
#> [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"     
#> 
#> $class
#> [1] "data.frame"
#> 
#> $row.names
#>   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
#>  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
#>  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
#>  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
#>  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
#>  [91]  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107 108
#> [109] 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
#> [127] 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
#> [145] 145 146 147 148 149 150
#> 
#> $metadata
#> $metadata$radar
#> [1] "bejab"
#> 
#> $metadata$regular
#> [1] FALSE

# If the data frame is handled by other functions, the attributes are retained
df %>%
  filter(Species == "virginica") %>%
  attr("metadata")
#> $radar
#> [1] "bejab"
#> 
#> $regular
#> [1] FALSE

^{Created on 2024-02-20 with reprex v2.1.0}

In addition, we should add a vpts class to the data.frame, so it is easy to recognize:

library(dplyr)
df <- iris
class(df) <- c("vpts", class(df))
class(df)
#> [1] "vpts"       "data.frame"

# Class is retained by dplyr
df %>%
  filter(Species == "virginica") %>%
  class()
#> [1] "vpts"       "data.frame"

# Class can also be added to a tibble
df <- iris
dft <- as_tibble(df)
class(dft) <- c("vpts", class(dft))
class(dft)
#> [1] "vpts"       "tbl_df"     "tbl"        "data.frame"

# Class is retained by dplyr
dft %>%
  filter(Species == "virginica") %>%
  class()
#> [1] "vpts"       "tbl_df"     "tbl"        "data.frame"

^{Created on 2024-02-20 with reprex v2.1.0}

bart1 · 2024-02-26T11:55:07Z

A few remarks from my side. For move2 I pretty much do everything in the way @peterdesmet describes (based on sf). Some extra properties are retained as attributes that works quite well and you can update these when required using custom methods. For that it is indeed important to add the vpts class on top of a data frame. For sf/move2 it does not matter if the underlying data.frame is a tbl or a real data.frame. This helps as sometimes tbl are considerably faster for example when reading using readr or vroom.

This is also quite interesting as a read for restoring objects after dplyr operations: https://dplyr.tidyverse.org/reference/dplyr_extending.html

Maybe if it requires too many breaking changes you could also call it vpts_df

iskandari self-assigned this Feb 20, 2024

iskandari added this to the 0.8.0 milestone Feb 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`read_vpts()` cannot read bioRad created CSV #654

`read_vpts()` cannot read bioRad created CSV #654

CeciliaNilsson709 commented Feb 19, 2024

peterdesmet commented Feb 19, 2024

iskandari commented Feb 20, 2024 •

edited

peterdesmet commented Feb 20, 2024

bart1 commented Feb 26, 2024

read_vpts() cannot read bioRad created CSV #654

read_vpts() cannot read bioRad created CSV #654

Comments

CeciliaNilsson709 commented Feb 19, 2024

peterdesmet commented Feb 19, 2024

iskandari commented Feb 20, 2024 • edited

peterdesmet commented Feb 20, 2024

bart1 commented Feb 26, 2024

`read_vpts()` cannot read bioRad created CSV #654

`read_vpts()` cannot read bioRad created CSV #654

iskandari commented Feb 20, 2024 •

edited