Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ongoing discussion: pros/cons of original vs. TADA lat/lon fields #439

Open
cristinamullin opened this issue Apr 18, 2024 · 2 comments
Open

Comments

@cristinamullin
Copy link
Collaborator

cristinamullin commented Apr 18, 2024

  • Cristina: I noticed the new functions use the WQP lat/lon columns instead of the TADA versions of those lat/lon columns which are the numeric versions. TADA creates these numeric versions of the lat/lon columns for use in TADA_FlagCoordinates() which looks as the how many numbers are to the right of the decimal and flags lat/lon as imprecise if there are less than 3. That function also flags results outside of the U.S. Let's discuss when to use the original vs TADA lat/lon columns - are both necessary? Keep in mind that the duplicates (originals) may be removed if there is a TADA equivalent and the user wants to reduce number of cols at the end of module 1.

  • Katie: Yes, this is something we discussed in the ROSS team, as well! My old justification for using lat/lon (instead of TADA-ified lat/lon) was so that users could run the functions on datasets that weren't created using autoclean = TRUE. But, I am inclined to change this now that I see the value of always having TADA.Longitude/TADA.Latitude. For example, I can immediately use those flagging functions you've developed to ensure the geospatial computations are occurring across valid coordinates. (SIDE NOTE: I think this may actually be at least part of the issue that @hillarymarler encountered when running fetchATTAINS across Indiana... there's a wonky observation off the coast of Africa, so the function is trying to pull ATTAINS data across the whole eastern US.)

  • Cristina: We could add some code into the geospatial functions that looks for the TADA versions of lat/lon and if they don't exists, they could be created there. That way it would still be seamless for users that don't run autoclean. As of now, we do require the TADA versions in TADA_OverviewMap() and TADA_FlagCoordinates(). It might be interesting to include those in the Module 2 vignette example workflow. If running TADA_FlagCoordinates() before fetchATTAINS across Indiana fixes the issue @hillarymarler encountered (good catch!), that might be a good example to show. It is common for WQP data to have issues like that (incorrect sign on lat or lon puts the point outside of the area).

  • Including Hillary's comment below with example code that didn't work for Indiana:
    For larger data sets across a larger area, sometimes one or more of the ATTAINS objects expected are returned with a NULL value so then don't show up with TADA_ViewATTAINS.

There should be both lines and polygons, but only polygons are successfully included in the output.

Sometimes when I run the above example (or other similar ones) in fetchATTAINS, I will get an error message with a URL. At the URL, the final lines visible indicate: "exceededTransferLimit": true.
image

This is an example of a data set where I saw those issues:

test1 <- TADA_DataRetrieval(startDate = "2018-05-01",
endDate = "2018-09-30",
statecode = "IL",
applyautoclean = TRUE)
@hillarymarler
Copy link
Collaborator

I've been spending some time this morning looking at the IL example Cristina included above as well as some other examples. I am still seeing similar issues on some data sets.

Here is another example which does the same thing when I run TADA_GetATTAINS().

test <- TADA_DataRetrieval(startDate = "2018-05-01", endDate = "2018-09-30", statecode = "TN", applyautoclean = TRUE)

[1] "Your TADA data covers a large spatial range. The ATTAINS pull may take a while."
Warning message:
In readLines(con) :
incomplete final line found on 'https://gispub.epa.gov/arcgis/rest/services/OW/ATTAINS_Assessment/MapServer/3/query?&geometry=-90.1783%2c%2035.00305%2c%20-81.7235%2c%2036.623877&inSR=4326&resultRecordCount=2000&resultOffset=8000&spatialRel=esriSpatialRelIntersects&f=geojson&outFields=*&geometryType=esriGeometryEnvelope&returnGeometry=true&returnTrueCurves=false&returnIdsOnly=false&returnCountOnly=false&returnZ=false&returnM=false&returnDistinctValues=false&returnExtentOnly=false&featureEncoding=esriDefault'

If you follow the link and scroll to the bottom of the page, the final line indicates transfer limit was exceeded. I am still not sure why I am the only one experiencing this issue. I'm currently updating a bunch of packages (though none seem directly relevant to the TADA/ATTAINS functions).

Any suggestions of other things I should try? Or additional information I can provide that would be helpful?

@kathryn-willi
Copy link
Collaborator

Thank you for the clarification, this was helpful! It is my understanding that exceededTransferLimit: true means that there are so many features within a query that you cannot pull them all at once. To get around this, we use repeat to pull across a moving window of features (moving window of 2000, which I believe is the limit for ATTAINS' API). Therefore, I don't think this is a bug in the code... unless I'm misinterpreting something?

Here is the definition of "exceededTransferLimit" from the developer site:

The exceededTransferLimit property is now included in the JSON response when paging through a query result with the resultOffset and resultRecordCount parameters. When exceededTransferLimit is true, it indicates there are more query results and you can continue to page through the results. When exceededTransferLimit is false, it indicates that you have reached the end of the query results.

In the original PR, you also mentioned that sometimes a TADA_GetATTAINS() pull was not exporting all the ATTAINS data; instead it was only generating polygons, when it should be generating both lines and polygons. Have you been able to reproduce this issue again? We have yet to encounter this issue on our end, so it's still got us stumped!

This is the example you provided:
test1 <- TADA_DataRetrieval(startDate = "2018-05-01", endDate = "2018-09-30", statecode = "IL", applyautoclean = TRUE)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants