Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plans to migrate from GeoPackage to GeoParquet #290

Open
rafapereirabr opened this issue May 20, 2022 · 2 comments
Open

Plans to migrate from GeoPackage to GeoParquet #290

rafapereirabr opened this issue May 20, 2022 · 2 comments
Assignees
Labels

Comments

@rafapereirabr
Copy link
Member

rafapereirabr commented May 20, 2022

Context

All data sets used in geobr are currently stored in the format of GeoPackage .gpkg files. The choice for GeoPackage was an easy one. GeoPackage is a very robust, open standard and compact format for geospatial data. A key aspect here is that .gpkg files are platform-independent, so we can make sure that geobr data is consistent for both R and Python users.

Nonetheless, we are seeing major advances with the development of GeoParquet, a new data format to store geospatial vector data (point, lines, polygons). GeoParquet is built on top of Apache Parquet, a popular columnar storage format for tabular data. It is much (much!) more efficent than GeoPackage in terms of file storage as well as in terms speed to read and save files. I believe it's safe to say that GeoParquet has a bright future in the geospatial industry because of its flexibility and efficiency.

What to expect:

I would like to migrate all data sets available in geobr from GeoPackage to GeoParquet .parquet format in geobr v2.0. This should be done in 2023. I need some time fix some issues in geobr and it would be good to wait a little longer to see GeoParquet become a stable specification with more robust and stable packages to manipulate GeoParquet in R and Python.

How will this affect geobr users?

  • The only meaninful way this will affect users is that geobr v2.0 will be much faster. Because GeoParquet files are much smaller and because this format is more efficient for IO, download and reading times should be significatly reduced.
  • I will keep GeoPackage files stored for a while to make sure we have a very smooth transition.

How will this affect geobr developers?

There are already libraries that can read GeoParquet files in both R and Python (see below). geobr v2.0 will need to include just a couple more package dependencies to be able to read geospatial data in .parquet format. In practice, this should have minimum effects on code development.

@rafapereirabr rafapereirabr self-assigned this May 20, 2022
@JoaoCarabetta
Copy link
Collaborator

The python team supports this decision emphatically.

I just recommend to plan the transition carefully given that the geoparquet specs are not stable yet. Their current documentation expects stability at version v1.0.0, but they are still at version v0.3.0. (see text below)

Roadmap

Our aim is to get to a 1.0.0 within 'months', not years. The rough plan is:

  • 0.1 - Get the basics established, provide a target for implementations to start building against.
  • 0.2 / 0.3 - Feedback from implementations, 3D coordinates support, geometry types, crs optional.
  • 0.4 - Feedback from implementations, add spatial index.
  • 0.x - Several iterations based on feedback from implementations.
  • 1.0.0-RC.1 - Aim for this when there are at least 6 implementations that all work interoperably and all feel good about the spec.
  • 1.0.0 - Once there are 12(?) implementations in diverse languages we will lock in for 1.0

Our detailed roadmap is in the Milestones and we'll aim to keep it up to date.

@rafapereirabr
Copy link
Member Author

For the record, GeoParquet v1.0.0 (stable) has now been released.

In order to implement GeoParquet in geobr, we still need to investigate the best approaches / packages to read geoparquet into R and Python. Because this is all very recent, it might take a few months before we have stable R and Python packages to do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants