Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding unit attribute to document stream to include addresses with th… #320

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

sweco-semhul
Copy link

…is attribute, see e.g. openaddresses/openaddresses#3511

To resolve pelias/pelias#618 more details found in comments to that issue

Diana Shkolnikov and others added 6 commits November 3, 2017 09:40
…/openaddresses into adding-address-unit

* 'adding-address-unit' of https://github.com/sweco-semhul/openaddresses:
  fix(package): update pelias-wof-admin-lookup to version 4.3.2
  disable package-lock in .npmrc
  fix(package): update pelias-wof-admin-lookup to version 4.3.1
  Upgrade to pelias-wof-admin-lookup 4.3.0
  Revert "chore(package): update dependencies"
  chore(package): update dependencies
  fix(package): update pelias-wof-admin-lookup to version 4.2.0
  fix(package): update pelias-dbclient to version 2.3.0
@sweco-semhul
Copy link
Author

@orangejulius updated with changed order for street and unit attribute

@missinglink
Copy link
Member

missinglink commented Oct 4, 2018

It's been a long time and I'd love to see this merged.

The reason it's not an easy merge is that it makes a change to the name field which effects name.default and phrase.default fields.

This has a (potentially) big impact on the system because:

  • We are not sure of the impact on autocomplete
  • Many acceptance tests will fail (possibly due to improvements, which is good)

The potential issue with autocomplete is due to how the prefixngrams are calculated for the name.* field.

I suspect that if a user asks for 1E Foo Street and we don't have it, then the system will return unit 1E results from different streets, for example 1E Bar Ave, 1E Baz Road, etc...

This is only a suspicion and it'd need to be tested, I just wanted to write down my concerns with merging this and what needs to be tested before we do.

This behaviour would also be compounded by the similarity scoring which would favour tokens with a lower Inverse-Document-Frequency.

I drafted another PR yesterday #402 which added additional address fields not currently present in the model, that PR doesn't change name.default, so we might consider merging it ahead of this since it won't cause any breaking changes.

@rowanwins
Copy link

Following up on a super old PR that I'd also love to see merged

The last comment that was holding this PR up was around this

The potential issue with autocomplete is due to how the prefixngrams are calculated for the name.* field.
I suspect that if a user asks for 1E Foo Street and we don't have it, then the system will return unit 1E results from different streets, for example 1E Bar Ave, 1E Baz Road, etc...
This is only a suspicion and it'd need to be tested, I just wanted to write down my concerns with merging this and what needs to be tested before we do.

I've got something very similar setup locally and I've tested out this scenario.

  • The /autocomplete endpoint is returning an empty result if it doesn't find the correct unit. For example if I search for 12/140 Foo Street and 12 doesn't exist then the results are empty,
    • It doesn't return a result for 12/77 Foo Street or 12/140 Baz Road, so that's a positive.
    • Although perhaps with an unmatched unit something ought to be returned...
  • The /search endpoint is returning an array of the other units at the address. For example if I search for 12/140 Foo Street and 12/140 doesn't exist then 10/140 Foo Street is returned as a candidate address.
    • It doesn't return a result for 12/77 Foo Street or 12/140 Baz Road, so that's a positive.

Note that the data I'm testing with has unit numbers rather than unit letters, I'm not sure if this would make a material difference.

Let me know if I can do anything else on this.

@missinglink
Copy link
Member

missinglink commented Jan 19, 2021

@rowanwins this is similar to #402 in some ways and different in others.

A few observations about this PR:

  • The package.json change links to a github fork, I had a look at it but wasn't able to figure out why 🤷
  • This PR changed the name field, this has a big potential to cause issues because numeric unit numbers such as Unit 52, 110 Main Street would likely match 52 Main Street too, which is undesirable.

I suspect that second point was enough to scare us off merging it and would need to be thoroughly tested before we could consider hitting merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add unit numbers to addresses
4 participants