Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unit numbers to addresses #618

Open
2 tasks
dianashk opened this issue Jul 21, 2017 · 16 comments · Fixed by pelias/model#81, pelias/schema#258 or pelias/query#77 · May be fixed by pelias/openaddresses#320 or pelias/openstreetmap#368
Open
2 tasks

Add unit numbers to addresses #618

dianashk opened this issue Jul 21, 2017 · 16 comments · Fixed by pelias/model#81, pelias/schema#258 or pelias/query#77 · May be fixed by pelias/openaddresses#320 or pelias/openstreetmap#368

Comments

@dianashk
Copy link
Contributor

Hey team!

I was using your awesome geocoding engine when I noticed something interesting.
Let me tell you more about it.


background

In Portland, as well as other US cities, there are planned communities that have a single house number assigned to the whole development and unit numbers assigned to each house lot in the development. So querying for an address with just the shared house number should result in a list of addresses in that development with unique unit numbers.

current state

Currently, Pelias imports OpenAddresses data, which has the appropriate unit numbers for each address, without those valuable unit numbers. So our data has numerous records with what looks to be the exact same address but slightly different locations.

At query time, the API gets back a long list of all of these addresses and then decides to deduplicate them all down to a single record because they lack any unique characteristics.

/v1/search?text=50 NE VILLAGE SQUIRE, PORTLAND

/v1/search?text=21000 NW Quatama Rd, Portland

desired behavior

  • When querying for the common house number of a development, it would be ideal to get back a list of addresses with unique unit numbers that belong to that development.
  • When querying for the specific address with the unit number, the user should get back the exact location of that address and the result should also contain unit information.
@dianashk
Copy link
Contributor Author

One of the thoughts was to use the interpolation for all address queries instead of checking with ES first. However, there seems to be a problem with finding these units in the interpolation engine as well. The common house numbers are present in the data, but the individual units are missing. It would be good to investigate why that's happening.

@missinglink
Copy link
Member

can you please provide some examples of the unit numbers you are expecting to find with links to the source data?

@dianashk
Copy link
Contributor Author

The OA source can be found here
If you grep for ,50,NE VILLAGE you should be able to see 17 matching addresses with different unit numbers.

$ grep ",50,NE VILLAGE" ./us/or/portland_metro.csv
-122.404822,45.4982275,50,NE VILLAGE SQUIRE AVE,17,,,,97030,,96e580ae3ca840af
-122.4047135,45.4982497,50,NE VILLAGE SQUIRE AVE,16,,,,97030,,ed938477c2cc0919
-122.404545,45.4982479,50,NE VILLAGE SQUIRE AVE,15,,,,97030,,f484e126402e1c60
-122.4044473,45.498247,50,NE VILLAGE SQUIRE AVE,14,,,,97030,,13a4f0a76e89e233
-122.4042684,45.4982459,50,NE VILLAGE SQUIRE AVE,13,,,,97030,,6fb303c178eb2d4d
-122.4041579,45.4982448,50,NE VILLAGE SQUIRE AVE,12,,,,97030,,5e779bf6aa4968c2
-122.4040506,45.4982453,50,NE VILLAGE SQUIRE AVE,11,,,,97030,,bbee091d599933db
-122.4038379,45.4982622,50,NE VILLAGE SQUIRE AVE,10,,,,97030,,488a5c92cda4a1de
-122.4037477,45.4982356,50,NE VILLAGE SQUIRE AVE,9,,,,97030,,59f45a36968b37e1
-122.4037464,45.4979097,50,NE VILLAGE SQUIRE AVE,8,,,,97030,,45c0b9388786681c
-122.4038362,45.4978826,50,NE VILLAGE SQUIRE AVE,7,,,,97030,,0444a51b841983da
-122.4040366,45.4979048,50,NE VILLAGE SQUIRE AVE,6,,,,97030,,f441a312426edac1
-122.4041496,45.4979059,50,NE VILLAGE SQUIRE AVE,5,,,,97030,,8dd7da5d8d425743
-122.4043476,45.4979073,50,NE VILLAGE SQUIRE AVE,4,,,,97030,,ea0fc8be8f797872
-122.4044461,45.4979082,50,NE VILLAGE SQUIRE AVE,3,,,,97030,,5a64fb3515a5a4d2
-122.4046453,45.4978743,50,NE VILLAGE SQUIRE AVE,2,,,,97030,,adbe53f124a09122
-122.404757,45.4978864,50,NE VILLAGE SQUIRE AVE,1,,,,97030,,b9d1dabd3d0e577b

or similarly and in the same dataset there are 213 total addresses at 21000 NW QUATAMA RD

$ grep ",21000,NW QUATAMA" ./us/or/portland_metro.csv
-122.8960946,45.5218934,21000,NW QUATAMA RD,SPC 91,,,,97006,,ded6db4d505ffbe6
-122.8960946,45.5218934,21000,NW QUATAMA RD,SPC 100,,,,97006,,53b0af8ee4d2bbba
-122.8967017,45.5213388,21000,NW QUATAMA RD,,,,,97006,,3414cdafc2f4b992
-122.8978935,45.5208051,21000,NW QUATAMA RD,SPC 188,,,,97006,,6436a9d6677f5999
-122.8980784,45.5207527,21000,NW QUATAMA RD,SPC 189,,,,97006,,9eddaa43b4596a0b
-122.8982544,45.5207805,21000,NW QUATAMA RD,SPC 190,,,,97006,,a53cebd6d1013548
-122.8984162,45.5208571,21000,NW QUATAMA RD,SPC 191,,,,97006,,a43f79e326eb23f7
-122.8986217,45.5210072,21000,NW QUATAMA RD,SPC 192,,,,97006,,376a2cb7a0bc19b9
...

@sweco-semhul
Copy link

Seeing the same issue for Denmark where the official unique addresses also includes unit, but also level. See openaddresses/openaddresses#3511 for more details.

Looking in Pelias for Nikolaj Plads 26 in Copenhagen it, of course, results in only 1 address returned. http://pelias.github.io/compare/#/v1/search%3Ftext=Nikolaj%20Plads,%2026,%20Copenhagen,%20

To resolve this both level and unit needs to be added even i Pelias.

@sweco-semhul
Copy link

sweco-semhul commented Oct 18, 2017

Since I'm interested in using Pelias with address data for Denmark. I would like to help with this if possible and I have started to investigate where changes is needed to resolve this. Hopefully this can be helpful in development.
Since this i set to Q4-2017 is there a plan for this development and how can I help?

EDITED: checked means working and running with tests in forked branches

in pelias/model

in pelias/schema

in pelias/openaddress importer:

in pelias/openstreetmap importer:

in pelias/text-analyzer

in pelias/labels

in pelias/query

in pelias/api:

query

helper

middleware

sanitizer

service, configurations

@orangejulius
Copy link
Member

Hey @sweco-semhul,
This is a great list. I think you've identified most of the areas that would need to be updated.

One more important one is the Elasticsearch schema which tells Elasticsearch what type of data to expect and how to store it. A unit number field should be able to fit in with the rest of our address components.

Right now we have custom Elasticsearch analyzers for each of the other fields (housenumber, street, postalcode), and we'll likely need another one for unit number, even if it doesn't end up being very complex. The custom analyzers for the other address components are in the same repo.

We'd be super happy to assist you in any way to get started on this. It will be touching lots of different areas of the code, and so will probably take some trial and error to get right, but we'll be here. Let us know if you have questions or get stuck on anything.

@sweco-semhul
Copy link

sweet.

Thanks for pointing that out, will add it to the list. I have probably missed some more things and as you mentioned this will need some trail and error and testing.
Will start working on some pull requests and see how far I'll get.

@sweco-semhul
Copy link

@orangejulius
I have gone through each part of the Pelias-project and updated my comment above. Created changes to add unit attribute to each part. Written tests for each part and tested to load openaddress and openstreetmap data.

A lot of parts which have been changed but from what I can see its all running together and I´m able to load and search for danish addresses with unit attributes.

I have created a pull request for each change and referenced to this issue. I´m not certain about how npm dependencies is handled, now they reference to my fork including needed changes for it to run. e.g. https://github.com/pelias/api/pull/1052/files#diff-b9cfc7f2cdf78a7f4b91a753d10865a2
Should it be done in another way?

Hopefully some of you are able to test this and that it will at least help on the way of getting the unit attribute into Pelias.

Get back to me on anything I can do to help get this further.

@orangejulius
Copy link
Member

Hey @sweco-semhul,
I'm going through and reviewing all your pull requests now (I was busy writing a talk for the last few days :) ), and overall it looks great so far. You've made changes all over our code which is awesome and quite difficult.

So far, I was successfully able to use your schema, OA importer, and API changes to import Portland, OR, USA addresses and query for them.

However, the queries have to take the form HOUSENUMBER UNIT STREET, [CITY, COUNTRY], because of this line where the housenumber, then unit, and then street name are put in the default name field in that order.

From your initial issue in the openaddresses repo, I gather that's not the correct format in either Denmark or the United States. I think we can change that line to HOUSENUMBER STREET UNIT and things will work correctly for the US and Denmark for now.

We already have code that queries the individual components of an address to deal with differing address formats for the housenumber and street parts of an address (which does differ between Finland and the United States), so perhaps we'll have to extend that code to deal with unit numbers (@trescube what do you think?). But right now I think it's ok as long as we change the order in name.default.

Oh and just for proof we are getting close, here's one of the queries @dianashk linked to above:
image

@orangejulius
Copy link
Member

For how to deal with the modules, once we are ready, we can begin merging the pull requests starting from "the bottom" of the tree of dependencies. We have Greenkeeper and semantic-release which together will make those changes come through easily. Then we can either rebase or merge your other pull requests and it will all work out.

@sweco-semhul
Copy link

@orangejulius sorry for the late response. Been busy doing other things.

That seems like a reasonable change. The hardest for me when doing these changes was to get unit into the query parts. So maybe I have misunderstood something.
But the most important when querying the api a more logical search string for me would be "street housenumber unit", but I guess in that end it will be up to libpostal and addressit.

Another thing would be getting labels to look like "Nikolaj Plads 23 02, København K, Denmark" (street housenumber unit ...) but I guess that would be up to this method to fix https://github.com/pelias/api/pull/1052/files#diff-35694d3f8946406d873df923bf730703R47

Am I understanding that it correctly?

@orangejulius
Copy link
Member

Yes, I think we're in agreement and you're understanding perfectly.

Based on the link to address formats that you mentioned a while back, and your comments now, we want to be able to display and search for address in the american format (housenumber, street, unit) or the european format (street housenumber unit).

So I think the "standard" (which corresponds to the american format) string in that function you linked needs to be updated. otherwise I believe its ok. We generally store things in american format but our query logic is smart enough to handle either format. I'll put more concrete recommendations in the PRs themselves to help keep things clear :)

@sweco-semhul
Copy link

Cool, thanks for that.
We have an agreement and I will change it into the "standard" format and get back to you.

@sweco-semhul
Copy link

Sorry it took a wile but the changes to get street and unit in the right order should be there now and form what I can see it all still holds together. :)

@orangejulius
Copy link
Member

No worries about the delay. I think all these look good and we can start merging. I'll handle merging the dependencies in the right order so that greenkeeper can help us out. :)

@orangejulius orangejulius reopened this Nov 20, 2017
@sweco-semhul
Copy link

Thanks, happy to be able to give something back to a great project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment