Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: universal get_dataframe() method #1969

Open
aleaf opened this issue Sep 29, 2023 · 1 comment
Open

feature: universal get_dataframe() method #1969

aleaf opened this issue Sep 29, 2023 · 1 comment
Assignees
Milestone

Comments

@aleaf
Copy link
Contributor

aleaf commented Sep 29, 2023

Is your feature request related to a problem? Please describe.
Currently users can export shapefile representations of the model grid and package or variable objects. But the current export paradigm is clunky, sometimes slow, and users have little control over the output. With geopandas and other python packages, most geospatial operations that would have previously been done in a GUI or at the command line can be done in memory. To take advantage of this, Flopy users currently have to export a shapefile and read it back in, or write their own code to build a GeoDataFrame from the model grid information.

Describe the solution you'd like
This PR would add a universal get_dataframe() method that would export the modelgrid, package or variable contents to a tabular pandas.DataFrame format. If geopandas were installed, the DataFrame would be a GeoDataFrame with a 'geometry' column containing shapely polygons of the model grid cell represented by each row, and a .crs passed from the model grid. This would provide an easy gateway for users to all of the geospatial functionality of geopandas.

A few more details:

  • this would be different from the pandas-based list and array-type objects that are in development (feat(pandas): Flopy pandas support #1955) in that it would return all of the contents of whatever object the method is being called on in a single DataFrame (for example, multiple well package stress periods with a per column). The returned DataFrame would be just a DataFrame (or GeoDataFrame), not a custom flopy subclass of those.
  • this method should be able to handle large datasets (for example, a transient well package with 10,000 wells) quickly. Probably this should include caching the grid cell polygons for subsequent queries.
  • a 'squeeze' option should be included that only exports the relevant cells for the package or variable (for example, just the cells containing a boundary condition).
    • This option should maybe be True by default.
    • Maybe this applies to array-based data too (for example, model layers with only a small number of active cells)

Describe alternatives you've considered
Existing alternatives are described above. We could call the method get_geodataframe() instead, but this would be inconsistent with a regular pandas dataframe being returned if geopandas weren't installed (I don't think we want the clutter of get_dataframe() and get_geodataframe() methods. Returning regular DataFrames without geopandas could still be advantageous in providing a tabular representation of the respective object that would then be summarized, etc.

I'm happy to work on this, but given the changes coming in #1955, I'm wondering if it wouldn't make sense to wait until that is merged.

@aleaf aleaf self-assigned this Sep 29, 2023
@langevin-usgs
Copy link
Contributor

Hey @aleaf, agreed that it is probably best to wait until #1955 is completed, which should be soon. This would be a really nice addition. Not sure of the best way to toggle between gpd and non-gpd returned dataframes, but it does seem like that should be a user-controlled option. There are still complications in some situations for installing geopandas, so keeping that isolated somehow would be good. Excited to see what you come up with.

@wpbonelli wpbonelli added this to the 3.7.0 milestone Mar 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants