Skip to content

Modify the LAScatalog drivers

Jean-Romain edited this page Feb 13, 2020 · 13 revisions

Document updated on February 13th 2020 and up-to-date with lidR v3.0.0

In lidR when processing several files (or a big file) the dataset is split into chunks. Each chunk is processed sequentially and the different outputs are merged into a single object. For example the function find_trees returns a SpatialPointsDataFrame.

library(lidR)

LASfile <- system.file("extdata", "MixedConifer.laz", package="lidR")
ctg <- catalog(LASfile)

opt_chunk_size(ctg) <- 80

ttops = find_trees(ctg, lmf(3))
ttops
#> class       : SpatialPointsDataFrame 
#> features    : 294 
#> extent      : 481260, 481349.9, 3812921, 3813011  (xmin, xmax, ymin, ymax)
#> coord. ref. : +proj=utm +zone=12 +datum=NAD83 +units=m +no_defs +ellps=GRS80
#> variables   : 2
#> names       : treeID,     Z 
#> min values  :      1,  2.16 
#> max values  :    294, 32.07 

But if the dataset is really big, the output is likely to be very big as well. For example with 1500 trees/ha we have 150.000 trees/km² and 6.000.0000 trees of a 400 km² coverage. This is why the LAScatalog processing engine has a mechanism to sequentially write the outputs into files instead of storing everything into R. In the case of find_trees used in this example the output is a SpatialPointsDataFrame and the default behavior is to write shapefiles.

Write in shapefiles

Using the processing option opt_output_files we can tell the engine to sequentially write the outputs in files with custom filenames. In that case the output is no longer a SpatialPointsDataFrame but a list of written files.

opt_output_files(ctg) <- paste0(tempdir(), "/trees_{ID}")

ttops = find_trees(ctg, lmf(3))
ttops
#> [1] "/tmp/RtmpzftOBx/trees_1.shp" "/tmp/RtmpzftOBx/trees_2.shp" 
#> [3] "/tmp/RtmpzftOBx/trees_3.shp" "/tmp/RtmpzftOBx/trees_4.shp"

The default format is ESRI shapefile. But the user can change the format by modifying the catalog drivers.

Write in geopackage files

When the outputs are Spatial* objects the driver used to write the outputs is the driver named Spatial. Drivers are non documented features so this wiki page serves as documentation. The user can change the extension of the output. If this format is supported by rgdal::writeOGR it works. For example here we can use the geopackage format.

ctg@output_options$drivers$Spatial$extension <- ".gpkg"

ttops = find_trees(ctg, lmf(3))
ttops
#> [1] "/tmp/RtmpzftOBx/trees_1.gpkg" "/tmp/RtmpzftOBx/trees_2.gpkg" 
#> [3] "/tmp/RtmpzftOBx/trees_3.gpkg" "/tmp/RtmpzftOBx/trees_4.gpkg"

Here we wrote four geopackage files. But it is not really convenient. In a 400 tiles catalog we get 400 outputs. It would be great to write everything into a single file. It is not possible to append data into an existing shapefile or into a geopackage files (at least not with rgdal::writeOGR). So we need to choose a format that supports data addition.

Write in a SQLite database

The LAScatalog engine enables to use user-defined drivers. So let build our own driver relying on a SQLite database. A driver is made of:

  1. A function that write an object into a file using a path plus some options.
  2. An file extension
  3. The name of the argument used to pass the object to write
  4. The name of the argument used to pass the path of the file to write
  5. A list of extra parameters to the function used to write the object
library(RSQLite)

# Create a function that writes a SpatialPointsDataFrame into
# a SQLite database.
dbWrite_SpatialPointsDataFrame = function(x, path, name)
{
  x <- as.data.frame(x)
  con <- dbConnect(SQLite(), path)
  dbWriteTable(con, name, x, append = TRUE)
  dbDisconnect(con)
}

# Change the driver used to write SpatialPointsDataFrame objects
# User-defined drivers have the precedence
ctg@output_options$drivers$SpatialPointsDataFrame = list(
   write = dbWrite_SpatialPointsDataFrame,
   extension = ".sqlite",
   object = "x",
   path = "path",
   param = list(name = "trees"))

# Here we want to write everything into a single file. 
# The path is not templated
opt_output_files(ctg) <- paste0(tempdir(), "/trees")

ttops = find_trees(ctg, lmf(3))
ttops
#> [1] "/tmp/RtmpzftOBx/trees.sqlite" "/tmp/RtmpzftOBx/trees.sqlite" 
#> [3] "/tmp/RtmpzftOBx/trees.sqlite" "/tmp/RtmpzftOBx/trees.sqlite"

The output contains four times the same string because there is one output per processed chunk but only the first one is pertinent.

con <- dbConnect(SQLite(), ttops[1])
ttops <- dbReadTable(con, "trees")
dbDisconnect(con)

It works! We have all the trees into a SQLite database. One may adapt this example to another database.

⚠️ Don't use multicore! SQLite is designed for local and lightweight use. It does not support concurrency writing.

Add a new driver for unsupported types

The drivers allow for writing Spatial* objects, Raster* objects, LAS objects, data.frame object and this is enough for all the case in lidR. But a user can use the processing engine with it own function that does not necessarily return one of these types. The engine will fail like in the following:

test = function(cluster) {
  las = readLAS(cluster)
  if (is.empty(las)) return(NULL)
  return(list(0))
}

opt_output_files(ctg) <- paste0(tempdir(), "/file_{ID}")
ret = catalog_apply(ctg, test)
#> Error : Trying to write an object of class list but this type is not supported.

In this case like in the previous one, the user can define its own driver. To write a list we can use the function saveRDS.

ctg@output_options$drivers$list = list(
   write = base::saveRDS,
   object = "object",
   path = "file",
   extension = ".rds",
   param = list(compress = TRUE))

ret = catalog_apply(ctg, test)
ret = unlist(ret)
ret
#> [1] "/tmp/RtmpM35fhw/file_1.rds" "/tmp/RtmpM35fhw/file_2.rds" 
#> [3] "/tmp/RtmpM35fhw/file_3.rds" "/tmp/RtmpM35fhw/file_4.rds"