Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bug in get_coastlines() function for northern Australia #1212

Open
eatanygoodbookslately opened this issue Mar 18, 2024 · 0 comments

Comments

@eatanygoodbookslately
Copy link

eatanygoodbookslately commented Mar 18, 2024

Dear DEA Notebooks team

I hope you all are well. I’m a university research fellow and a happy user of DEA Coastlines for more than a year. I think I’ve discovered a bug in the 2021 Australian coastline data, specifically with the application of a bounding box to the data prior to download of the GeoDataFrame of line geometries:

myVariableName = get_coastlines(bbox=(xWest,ySouth,xEast,yNorth),crs="EPSG:4283")

According to the well-known GDA94 coordinate reference system (described for EPSG:4283 units of lat/long degrees here), the virtual borders to this dataset should be as below (the variable names are just indicative, they’re not actual keywords for the get_coastlines ‘bbox’ argument)

-xEast = 93.41 #degrees latitude
-xWest = 173.34 #degrees latitude
-ySouth = -60.55 #degrees longitude
-yNorth = -8.47 #degrees longitude

The apparent bug is that when yNorth is set to its EPSG:4283 value (-8.47), the northern extents of Australia become cropped out (see below). Only when yNorth is set to an unrealistically positive-tending value (zero degrees, i.e. equatorial) does the ‘get_coastlines()’ function eliminate its erroneous cropping (see below):

  • yNorth = 8.78° result (Northern extents are clipped off)
    2024-03-17_DEAC_Bug_Aus-N-bbox-847
  • yNorth = 0° result (northern extents are kept intact)
    2024-03-17_DEAC_Bug_Aus-N-bbox-ZERO

I was wondering if someone on the DEA Notebooks team could comment on this. To reproduce this error in Python, you can run the below Python code from within a DEA Sandbox notebook (not sure if I can attach JupyterLab notebooks here so I converted it into a Python script below - this can be run by pasting it into a single cell or splitting it at the "# SECTION" points).

# SECTION 1 - # Import & test modules
#IMPORT PYTHON STANDARD LIBRARY MODULES
import os,sys,pickle

#IMPORT COMMON EXTERNAL PYTHON PACKAGES
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd

#IMPORT GIS PYTHON PACKAGES
import geopandas as gpd
sys.path.insert(1,'../Tools/')
import dea_tools
from dea_tools.coastal import get_coastlines

print("IMPORT: matplotlib version =", mpl.__version__)
print("IMPORT: pandas version =", pd.__version__)
print("IMPORT: geopandas version =", gpd.__version__)
print("IMPORT: dea_tools version =", dea_tools.__version__)


# SECTION 2 - Coastline data import
#Specify bounding box using CRS EPSG:4283 (degrees, GDA94)
#Source of bbox limits below was [https://epsg.io/4283] as at 2024/03/17
#Metadata states "Australia including Lord Howe Island, Macquarie Island, Ashmore and Cartier Islands, Christmas Island, Cocos (Keeling) Islands, Norfolk Island. All onshore and offshore."
#Most recent computational runtime for this (slowest in the notebook) cell was up to 14 minutes on 17th March 2024

xEast = 93.41
xWest = 173.34
ySouth = -60.55
yNorth = -8.47 #Clips off the northern extents of Australia

#Import DEA Coastlines dataset within an Australia-wide bounding box
deacl_annualshorelines_gdf = get_coastlines(bbox=(xEast,ySouth,xWest,yNorth),crs="EPSG:4283")

#Print dataset metadata (EPSG:3577 is GDA94 in metres)
print(f"PROGRESS: Dtype of 'year' column is {deacl_annualshorelines_gdf['year'].dtype}")
print(f"PROGRESS: CRS of dataset is {deacl_annualshorelines_gdf.crs}")


# SECTION 3 - Data visualisation
#Plot the matching GDF rows
deacl_shorelines_gdf_year_needed.to_crs(epsg=4283).plot();

I was advised to try the Web Feature Service interface to the DEA Coastlines dataset and fortunately found this is working just fine

#Specify bounding box to download coastline for Australia (lat/long degrees in GDA94 are via EPSG:4283)
#Last runtime for this cell was ~5 minutes
ymax,xmin = -8.47,93.41
ymin,xmax = -60.55,173.34

#Set up WFS requests for annual shorelines (WGS84 in degrees is EPSG:4326)
#Info is at "https://knowledge.dea.ga.gov.au/data/product/dea-coastlines/?tab=access"
deacl_annualshorelines_wfs = f'https://geoserver.dea.ga.gov.au/geoserver/wfs?' \
                       f'service=WFS&version=1.1.0&request=GetFeature' \
                       f'&typeName=dea:shorelines_annual' \
                       f'&bbox={ymin},{xmin},{ymax},{xmax},' \
                       f'urn:ogc:def:crs:EPSG:4326'

#Load DEA Coastlines data from WFS
deacl_annualshorelines_gdf = gpd.read_file(deacl_annualshorelines_wfs)

#Ensure CRS is set correctly
deacl_annualshorelines_gdf.crs = 'EPSG:3577'

#Isolate a year and plot it
deacl_annualshorelines_2021_gdf = deacl_annualshorelines_gdf[deacl_annualshorelines_gdf['year']==2021.0]
deacl_annualshorelines_2021_gdf.plot();

Thank you for your time and my apologies if I've omitted any information (this is actually my first open-source bug report!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant