Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LocationColumn limitations; Geocoding via DataSync #131

Open
johnrager opened this issue Jun 16, 2016 · 2 comments
Open

LocationColumn limitations; Geocoding via DataSync #131

johnrager opened this issue Jun 16, 2016 · 2 comments

Comments

@johnrager
Copy link

We're looking into enhancing our automated refresh process to start taking advantage of DataSync’s SDK rather than the SODA 2 library we currently rely on. We’ve run into an issue that has pretty-much stopped us in our tracks, related to how geocoding of address fields is handled in the DataSync SDK. The support issue thread is: https://support.socrata.com/hc/en-us/requests/14390.

From what we understand we have two options:

  1. Use the DataSync SDK synthetic location object to build a location using street address, city, state, zip. This won’t work for many of our refreshes because they don’t necessarily fit into the strict four-field format. For example, we have datasets where street number and street name are split into two fields, or the state column is not present in the dataset and is assumed to be “NY”.
  2. Programmatically build and append a location column to the refresh CSV prior to submitting it to DataSync – this is the way our automated process currently does it. We have a format string assigned to any dataset that requires geocoding, which when combined with the data by our process results in either a lat/long pair or address which is appended to the CSV.

Because of the limitation we ran into with option 1, we’ve been pursuing option 2 but have run into a problem. It appears DataSync is much stricter with its geocoding and we’ve run into addresses that have actually caused the entire refresh process to fail. If we run the same data through either the web interface or through our existing SODA 2 refresh process, the entire refresh runs but some rows just don’t get geocoded. This is expected. If we run the file through DataSync, it fails completely as soon as it hits the first bad address.

We tried testing via DataSync with “Set aside errors” turned on and the process completed but the problem rows were excluded from the dataset. This isn’t workable from our perspective. We can’t have rows missing just because an address didn’t geocode, and with the number of datasets we have we can’t distribute problem reports to data owners asking them to correct addresses and resubmit. We need DataSync to handle geocoding just like the web interface and SODA 2 does.

We’d really like to make DataSync more of a part of our operation, but we don’t think we can unless we have a more workable way to handle geocoding. We’re pretty-much dead in the water on this right now.

@johnrager
Copy link
Author

Would like to add a thought on this that might get us and other customers just the flexibility we need: Add another switch "Ignore geocoding failures" to the GUI and SDK governing whether the inability to geocode and address should be considered an "error" or not. If set "on", then just set the Location column for that row to null and continue. If set "off", then treat it as an error and let "Set aside errors" govern what to do next.

@levyj
Copy link
Contributor

levyj commented Jul 29, 2016

We do not use Socrata geocoding much so I do not necessarily have too much of a stake in this but I like that suggestion.

Where I do have a stake is to ask that Socrata be careful about any new features breaking existing processes or workflows. Sometimes, when flags have changed before, it has been in ways that were not fully backwards compatible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants