Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Append unclassified tokens to the street #28

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Joxit
Copy link
Member

@Joxit Joxit commented May 25, 2019

I created a solver that can fill the blanks (only for StreetPrefixClassification).
We have some very long streets names, and this is not simple to safely match all street names.
I thought that the best way to do this is to append unclassified tokens to the street (when the token is at the end of the street).
Maybe it can also be used for venues.

Paris is always used as a locality, so I removed it from regions.
Add cité in street_types.

@Joxit Joxit changed the title Add some cases for street_name Append unclassified tokens to the street May 25, 2019
@missinglink
Copy link
Member

One thing I worry about with this is how it will affect Pelias queries generated from autocomplete input....

@missinglink
Copy link
Member

I'm not sure about rewriting the span body, is this really required?

The combination of these tokens should already be present in the 'phrases' for that section.

It should be possible to find the phrase you are looking for and then classify it directly to avoid editing any of the existing spans.

@Joxit
Copy link
Member Author

Joxit commented Jun 5, 2019

One thing I worry about with this is how it will affect Pelias queries generated from autocomplete input....

Hum, you're totally right, the last token shouldn't be appended.
Rue Saint-Germains Ermon (the real locality is Ermont) should not returns Boulevard Saint-Germains Ermon as streets... It's more safe if we already have something like Rue du 8 Mai Ermont (Mai isn't in the solution).

I'm not sure about rewriting the span body, is this really required?

The combination of these tokens should already be present in the 'phrases' for that section.

It should be possible to find the phrase you are looking for and then classify it directly to avoid editing any of the existing spans.

I wanted to have your opinion on this PR. There are also something that bothers me in what I did....
I will try what you said. 😄

@Joxit Joxit changed the title Append unclassified tokens to the street [DO NOT MERGE] Append unclassified tokens to the street Jun 11, 2019
…the street

This will be used only when StreetPrefixClassification is used.

Remove Paris from regions and add cité in street_types.
Paris is always used as a locality
Now I replace the solution with the correct phrase
@Joxit Joxit changed the title [DO NOT MERGE] Append unclassified tokens to the street Append unclassified tokens to the street Jul 10, 2019
@Joxit
Copy link
Member Author

Joxit commented Jul 15, 2019

I've updated this PR.

  • I update the solution with an existing span
  • I don't fill the solution with a end-token span
  • This works only with street prefix classification

@Joxit Joxit requested a review from missinglink July 15, 2019 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants