Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support latitude/longitude matching #83

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

reidab
Copy link
Contributor

@reidab reidab commented Mar 11, 2018

I saw that #30 tried to add this, but relied too heavily on magic field names. This is a more in-depth stab at bringing nicer coordinate matching to csvdedupe. (This also includes the commit from #82, since nothing involving config files will run without it)

To achieve this, it's necessary to both transform the data we read from the CSV and to make an internal copy of the field definitions that we can bend to match what dedupe expects.

  • For one field containing both latitude and longitude, the JSON config now supports both LatLong or LongLat types. They expect a field with the coordinates separated by some (any) non-numeric characters, so formats like -122.23,46.42, -122.23, 46.42, and -122.23 46.42 will all work.

    Behind the scenes, we split the field to the tuple of floats that dedupe expects. For the LongLat type, we reverse it to LatLong order and swap the internal type to LatLong.

  • For latitude and longitude in separate fields, we add Latitude and Longitude convenience types. Internally, these are coalesced into a single LatLong field called __LatLong.

    One side effect of this change is that the training UI will now show this combined field instead of the original field names, but I think that's a fairly minor tradeoff.

Copy link

@batesmotel34 batesmotel34 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidentally left this. Please ignore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants