Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad Parsing of Address #339

Open
bschollnick opened this issue Oct 19, 2022 · 0 comments
Open

Bad Parsing of Address #339

bschollnick opened this issue Oct 19, 2022 · 0 comments

Comments

@bschollnick
Copy link

bschollnick commented Oct 19, 2022

Using USAddress 0.5.10, under python 3.10.1, using usaddress.tag.

Case 1 -
`
usaddress.RepeatedLabelError:
ERROR: Unable to tag this string because more than one area of the string has the same label

ORIGINAL STRING: 9999 Walker LK Ontario Road,Hilton, NY 14468,US
PARSED TOKENS: [('9999', 'AddressNumber'), ('Walker', 'StreetName'), ('LK', 'StreetNamePostType'), ('Ontario', 'StreetName'), ('Road,', 'StreetNamePostType'), ('Hilton,', 'PlaceName'), ('NY', 'StateName'), ('14468,', 'ZipCode'), ('US', 'CountryName')]
UNCERTAIN LABEL: StreetName
`
It appears that LK as an abbreviation for LAKE, isn't being processed correctly.

Case 2 -
`
usaddress.RepeatedLabelError:
ERROR: Unable to tag this string because more than one area of the string has the same label

ORIGINAL STRING: Beech Street Corp PO Box 999999,Richardson, TX 75085-3925,US
PARSED TOKENS: [('Beech', 'StreetName'), ('Street', 'StreetNamePostType'), ('Corp', 'PlaceName'), ('PO', 'USPSBoxType'), ('Box', 'USPSBoxType'), ('999999,', 'USPSBoxID'), ('Richardson,', 'PlaceName'), ('TX', 'StateName'), ('75085-3925,', 'ZipCode'), ('US', 'CountryName')]
UNCERTAIN LABEL: PlaceName
Case 3 - ERROR: Unable to tag this string because more than one area of the string has the same label

ORIGINAL STRING: 99999 Bristol Blue St,Apex, NC 27502 4115,US
PARSED TOKENS: [('99999', 'AddressNumber'), ('Bristol', 'StreetName'), ('Blue', 'StreetName'), ('St,', 'StreetNamePostType'), ('Apex,', 'PlaceName'), ('NC', 'StateName'), ('27502', 'ZipCode'), ('4115,', 'ZipPlus4'), ('US', 'StateName')]
UNCERTAIN LABEL: StateName
`
Changing case 2 to Beech Street Corp, PO Box 999999,Richardson, TX 75085-3925,US
does parse, but I'm having issues with devising logic to handle this properly.

I have some situations where there are two address lines, and the parse failed, but succeeded when I removed the comma between the address lines.

Case 3 seems to be unaware of North Carolina?

Can you elaborate on the proper formatting of the input string? (e.g. include commas? Don't include line delimiters?)

The reason I ask is that I am seeing commas at the end of StreetNamePostType, and so forth?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant