Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📢 Announcement: Upcoming changes for {mapme.biodiversity} #240

Open
goergen95 opened this issue Feb 16, 2024 · 6 comments
Open

📢 Announcement: Upcoming changes for {mapme.biodiversity} #240

goergen95 opened this issue Feb 16, 2024 · 6 comments

Comments

@goergen95
Copy link
Member

goergen95 commented Feb 16, 2024

Dear MAPME friends,

today we are happy to make some announcements about the future direction
of developments for {mapme.biodiversity}. Before diving into the details
let me thank you for choosing this package and for the various
contributions we were lucky to receive from you during the last years! 🎉

In order to adapt to new challenging and diverse use cases and in the
hope to serve an even wider community, some major changes to the
packages are in order. This issue is about informing you about the
upcoming changes as we walk on our path towards a 1.0 release. It gives
a detailed overview of what changes you can expect in the next several
months and exactly when to expect them. We encourage you to adopt to
these changes early on, and ask you to share your feedback with
us along the way.

The overall vision of the proposed changes is to provide you with a
package that is versatile in its application environments, delivering
clearly structured outputs that can be serialized to different data
formats with ease, while providing a clear and idiomatic interface.

Important

This issue will be used to inform you about the progress as we will post
updates here once we reach each milestone. Note, that discussions about
details of the proposed changes or specific implementation details are
best discussed in separate issues.

To achieve our vision from the current state will mean that we introduce
some breaking changes along the way. It is our wish to inform you ahead
of time, so that you can plan accordingly.

Schedule of the milestones

We ordered the milestones with the most severe changes
happening upfront. That way, once you adopted your workflows to a
milestone the adaption to the next should be less severe.

Milestone Merge on Main Release to CRAN
User-Interface End of March End of April
Standardized Output End of April End of May
GDAL Backend End of May End of June

: Schedule for the proposed milestones.

Important

Please note that this is a preliminary schedule. While we will not shorten
the overall time-frame, but it might take us longer than expected to implement
the indicated changes.

Milestone 1: Cleaner Interface Using Closures

This milestone sets out to provide you with a cleaner interface that
provides instant feedback if arguments are wrongly specified. For this,
we are going to use closures, e.g. functions that return other
functions. The arguments that are important for you as a user to control
the functionality are exposed at the outer level. This will make a call
to fetch some resources and the subsequent calculation of indicators
look something like this:

aoi <- get_resources(aoi, get_nasa_srtm(), get_gmw(years = 2010)) 
aoi <- calc_indicators(aoi, calc_elevation(stat = "mean"), calc_mangroves_area())

This also means that it will be easier to access the help pages for
a resource/indicator, because these will be associated with fu nction
names you are actually using, e.g.:

?get_nasa_srtm
?calc_elevation

Arguments will instantly be checked for correctness and inform you about
any miss-specifications. This interface will also make it easier to add
custom resources/indicators add-hoc, for those that require this
functionality.

Milestone 2: Standardized Indicator Output and Serialization Options

With this milestone we will have revised all indicator functions to
return a standardized output format. The envisaged output format is
inspired by the MovingFeatures standard and differentiates between
simple and temporal properties. Simple properties are 1-dimensional
attribute values of features that we are most familiar with from various
GIS software. However, some of our indicators actually have a temporal
axis. We will harness the fact that we already use nested list columns
to represent our indicator data in R. However, we will standardize the
output of all temporal indicators to a common format, e.g. along the
lines of the following example output:

## # A tibble: 76 × 6
##       datetimes            variable  unit   value
##       <chr>                <chr>     <chr>  <dbl>
##  1    2000-01-01T00:00:00Z treecover ha    12089.
##  2    2001-01-01T00:00:00Z treecover ha    12075.
##  3    2002-01-01T00:00:00Z treecover ha    12053.
##  4    2003-01-01T00:00:00Z treecover ha    11978.
##  5    2004-01-01T00:00:00Z treecover ha    11926.
##  6    2005-01-01T00:00:00Z treecover ha    11877.
##  7    2006-01-01T00:00:00Z treecover ha    11851.
##  8    2007-01-01T00:00:00Z treecover ha    11800.
##  9    2008-01-01T00:00:00Z treecover ha    11780.
## 10    2009-01-01T00:00:00Z treecover ha    11758.

This way we allow for a much better predictability of the indicator
output format for downstream applications. Also, it allows us to supply
you with seamless serialization functions for GeoPackage, GeoJSON, and
even MovingFeatures JSON.

We will also take the opportunity to revise and optimize the indicator
functions for better overall performance. Additionally, we will add
support for multi-polygon geometries by supplying a mechanism for
selecting aggregation functions for indicators.

Milestone 3: Routing data I/O through GDAL

In its current state the package downloads data to the local file
system. This severely limits the environments in which the package can
be used efficiently. We received several requests to support e.g.
different types of cloud storage. Manually maintaining drivers to
read/write to commercial cloud storage is not something we are able to do.
However, the good news is that GDAL already supports major cloud storage
providers via its Virtual Filesystem drivers. With this milestone, we
will leverage GDAL's capabilities to read/write geodata from a huge
range of formats and sources. This will allow users of
mapme.biodiversity to run their applications on a cloud provider of
their choice. We will supply configuration options that will ease the
process to authenticate e.g. against a cloud storage attached to a
compute in the cloud. Thus, the interface to pull data from the internet
to an S3 cloud storage will look something like this:

mapme_options(
  outdir = "/vsis3/my-s3-bucket",
  gdal_opts = gdal_s3_opts(
   AWS_ACCESS_KEY_ID = "my-aws-key",
   AWS_S3_ENDPOINT = "https://my-bucket.af-south-1.amazonaws.com"
  )
)

get_resources(aoi, get_nasa_srtm())

Leveraging GDAL for data transfer will also allow us to translate
data to cloud-optimized formats that should increase computation
speeds further down the line.

In case a specific resource is already provided in a cloud-optimized
data format on a low-latency server, you might also decide to skip the
download step altogether. This will most likley be only efficient for
small to medium size portfolios and since we do not control how
resources are provided, there will be limitations which resources support
such a workflow.

Development of new resources/indicators

Important

We schedule this transition to be a process spanning approximately four
months. During this period, we will not include new
resource/indicators in {mapme.biodiversity}, but reduce our activity on
this site to bug-fixing of existing resources/indicators.

However, in the meantime, we will work on new resources/indicators
at {mapme.indicators}.

After all milestones have been achieved, we will decide if a separation
of the backend from concrete resource/indicators implementation makes
sense for the future. In the case the answer is yes, all resources/indicators
will eventually be migrated to {mapme.indicators}. In the case the answer
is no, we will migrate the newly developed indicators into {mapme.biodiversity}.
As always, you are invited to share your feedback along the way.

@fBedecarrats
Copy link
Collaborator

This is great news, congratulations!
Two questions:

  • what about discussions regarding possible further splitting between mapme.* packages? I heard this was in the air and I wonder if this would be a subsequent move we need to anticipate as users or (modest) contributors.
  • How would you recommend to handle the transition? We are starting a project in March with short deadlines for the first deliverables based on mapme.biodiversity. Is there a specific package milestone/version number we should refer to before all these changes? Is it recommended to install it via {remotes}?

Thanks in advance for the feedback!

@goergen95
Copy link
Member Author

Thanks for your feedback and questions!

what about discussions regarding possible further splitting between mapme.* packages?

As indicated, that is not a settled issue yet and we are happy to receive your feedback. However, if we were to split the package the casual user would call library(mapme.indicators) and since that package depends on the backend package that would be all you would need to change to your workflows (though we would most probably need to get it published on CRAN as well). The backend package itself would then only be of interest to more involved contributors.

How would you recommend to handle the transition?

I cannot give specific recommendations on this, as the decision how to best handle this will depend on your context. You will be able to install all prior published versions from CRAN during the process (so you could settle to conduct your most urgent work with version 0.5 via remotes::install_version("mapme.biodiversity", version = "0.5")). As you can see we also plan to send updates to CRAN when reaching each specific milestone. To fully benefit from the upcoming changes, I however recommend to adapt to the new interface as early as possible.

@goergen95
Copy link
Member Author

goergen95 commented Mar 28, 2024

Today, we are happy to anounce that the development branch including the latest
development of our first milestone Cleaner Interface Using Closures is ready
for testing! 🎉

Please revise NEWS.md for a quick overview of the proposed changes.

You can install the package from the main branch via:

remotes::install_github("mapme-initiative/mapme.biodiversity")

We also provide a ready-to-use docker image that is re-build every day
with the latest changes on the main branch. To pull the image and run an
R Studio instance locally on localhost:8787 run:

docker pull ghcr.io/mapme-initiative/mapme-spatial-dev:1.0
docker run --rm -p 8787:8787 -e PASSWORD=supersecret ghcr.io/mapme-initiative/mapme-spatial-dev:1.0

We advise you to adapt to the new UI as early as possible and are also asking
you to provide your feedback via dedictated issues.

As a reminder of the time schedule, we aim to send a new release to CRAN towards the end of April.

@goergen95
Copy link
Member Author

To ease the development process and the discoverability of the milestones, we slightly changes the process. Now, the milestones will be developed on a dev branch while they will be published on the main branch as early as possible. After a period of one month, new CRAN releases will be conducted. Above comments were adjusted to reflect the changed process.

@goergen95
Copy link
Member Author

{mapme.biodiversity} v0.6.0 has just been released and should be available in the coming days from CRAN. This release includes the updated user-interface for querying resources and indicators. The release notes contain additional information.

@goergen95
Copy link
Member Author

The latest changes of our second milestone Standardized Indicator Output and Serialization Options is ready
for testing on the main branch.

Please revise NEWS.md for a quick overview of the proposed changes.

You can install the package from the main branch via:

remotes::install_github("mapme-initiative/mapme.biodiversity")

We also provide a ready-to-use docker image that is re-build every day
with the latest changes on the main branch. To pull the image and run an
R Studio instance locally on localhost:8787 run:

docker pull ghcr.io/mapme-initiative/mapme-spatial-dev:1.0
docker run --rm -p 8787:8787 -e PASSWORD=supersecret ghcr.io/mapme-initiative/mapme-spatial-dev:1.0

We advise you to adapt to the output format as early as possible and are also asking
you to provide your feedback via dedictated issues.

As a reminder of the time schedule, we aim to send a new release to CRAN towards the end of May.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants