Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rasterband::write with Buffer<T> forces unnecessary data copy. #453

Open
metasim opened this issue Oct 16, 2023 · 5 comments
Open

Rasterband::write with Buffer<T> forces unnecessary data copy. #453

metasim opened this issue Oct 16, 2023 · 5 comments

Comments

@metasim
Copy link
Contributor

metasim commented Oct 16, 2023

When working with large rasters, minimizing the number of data copies is important. The Rasterband::write method as it currently stands, forces some users into an additional buffer copy.

To illustrate, consider the following use case (pseudo-ish code):

let mut data: Vec<u8> = vec![0u8; <some big number>];
for i in 0..100 {
   modify_data(&mut data);
   write_data_with_gdal(&data);
}

Seems reasonable, no? And let's assume for external reasons we only have control over the implementation of write_data_with_gdal. Problem arises when we try to implement it:

fn write_data_with_gdal(data: &Vec<u8>) {
  let driver = DriverManager::get_driver_by_name("GTiff").unwrap();
  let ds = driver.create(...).unwrap();
  let mut rb = ds.rasterband(1).unwrap();
  rb.write(/*... what to do here? ...*/).unwrap();
}

The Rasterband::write method looks like this:

pub fn write<T: GdalType + Copy>(
        &mut self,
        window: (isize, isize),
        window_size: (usize, usize),
        buffer: &Buffer<T>,
    ) -> Result<()> {

It wants the data as as a reference to a Buffer<T>, which looks like this:

pub struct Buffer<T: GdalType> {
    pub size: (usize, usize),
    pub data: Vec<T>,
}

The contained Vec is owned by Buffer::data. We only have a &Vec<u8>. So to call write, even though write only needs a Buffer<T> reference, the only way to create the referent is to clone our &Vec<u8> parameter:

fn write_data_with_gdal(data: &Vec<u8>) {
  let driver = DriverManager::get_driver_by_name("GTiff").unwrap();
  let ds = driver.create(...).unwrap();
  let mut rb = ds.rasterband(1).unwrap();
  let dims = (data.len(), 1);
  let buf = Buffer { size: dims, data: data.clone() }; // <--- no way to avoid `clone()`
  rb.write((0, 0), dims, &buf).unwrap();
  // `buf` is dropped...
}

I'd argue we need an alternative to this write method for users not wishing to incur the additional memory and time overhead. One might argue that the user should only work with Buffer<T> instances, but given that working interoperability with other Vec<T>-based APIs is very common in the raster space, we should support writing a Vec<T> without the clone() cost.

@lnicola
Copy link
Member

lnicola commented Dec 1, 2023

For anyone running into this, you can use the following idiom as a workaround:

fn write_data_with_gdal(data: Vec<u8>) -> Vec<u8> { // pass by value and return back
  let driver = DriverManager::get_driver_by_name("GTiff").unwrap();
  let ds = driver.create(...).unwrap();
  let mut rb = ds.rasterband(1).unwrap();
  let dims = (data.len(), 1);
  let buf = Buffer { size: dims, data: data }; // no clone()
  rb.write((0, 0), dims, &buf).unwrap();
  let Buffer { data, .. } = buf;
  data
}

@metasim
Copy link
Contributor Author

metasim commented Jan 2, 2024

@lnicola What if you have a &[u8]... Is there are workaround for that?

@lnicola
Copy link
Member

lnicola commented Jan 2, 2024

No, we'd need two buffer variants or a view over owned or borrowed data.

@metasim
Copy link
Contributor Author

metasim commented Jan 3, 2024

Would you be opposed to a RasterBand::write_slice method?

@lnicola
Copy link
Member

lnicola commented Jan 3, 2024

We'd need another one for the block function(s) though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants