Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nickname + last name #78

Open
rsimoes opened this issue Oct 26, 2018 · 6 comments
Open

nickname + last name #78

rsimoes opened this issue Oct 26, 2018 · 6 comments
Labels
Milestone

Comments

@rsimoes
Copy link

rsimoes commented Oct 26, 2018

Names such as

"Rick" Edmonds

are parsed in such a way that "Edmonds" is treated as the first name rather than the last name.

@derek73 derek73 added the bug label Oct 26, 2018
@derek73 derek73 added this to the v1.0.2 milestone Oct 26, 2018
@derek73
Copy link
Owner

derek73 commented Oct 26, 2018

So I think this is what you want?

$ python tests.py '"Rick" Edmonds'
<HumanName : [
	title: '' 
	first: '' 
	middle: '' 
	last: 'Edmonds' 
	suffix: ''
	nickname: 'Rick'
]>

What do you think about the first name field remaining blank in this case?

Right now the nickname handling happens as a preprocess without any awareness of where the nickname appears in the string. I had planned to refactor the nickname handling a bit in order to better support maiden names (#22) that happen after a last name.

Right now I could fix it in this single case where there is no titles or other name pieces, but seems like it should also support things like 'Senator "Rich" Edmonds' too, and that will need to wait for a bit bigger refactor to move that code into the parse logic and take position in the string into account.

@derek73 derek73 modified the milestones: v1.0.2, 1.1.0 Oct 26, 2018
@rsimoes
Copy link
Author

rsimoes commented Oct 26, 2018

Right, my expectation would be that the first name remains blank. A narrow fix for names in the form of a nickname followed by a last name (and nothing else) would be sufficient for my needs at the moment!

@derek73
Copy link
Owner

derek73 commented Oct 26, 2018

I just released v1.0.2 which has a narrow fix for that one example. I'm going to leave this open and close it when I update the nickname handling logic.

@rsimoes
Copy link
Author

rsimoes commented Oct 31, 2018

Sounds great, thanks! Just adding a follow-up issue for the coming update: "Rick" Edmonds now parses perfectly, but "Rick" Edmonds Jr. (or "Rick" Edmonds III, etc.) flies off the rails a bit:

In [3]: HumanName('"Rick" Edmonds Jr.')
Out[3]: 
<HumanName : [
	title: '' 
	first: 'Edmonds' 
	middle: '' 
	last: 'Jr.' 
	suffix: ''
	nickname: 'Rick'
]>

In [4]: HumanName('"Rick" Edmonds, Jr.')
Out[4]: 
<HumanName : [
	title: 'Jr.' 
	first: '' 
	middle: '' 
	last: 'Edmonds' 
	suffix: ''
	nickname: 'Rick'
]>```

@derek73
Copy link
Owner

derek73 commented Oct 31, 2018

Yea, that makes sense. I didn't make any changes to the codepaths that handle the comma formats, I'm guessing that behavior is unchanged from the previous version. I literally hardcoded it to only handle 2 name parts, which is why it fails when you add "jr" to the end.

Trying to think a bit about the final behavior we want. Is it true that nicknames only happen after first names? I feel like these are the cases I know about that we want to handle in some way:

  • Nickname - Robert "Bob" Jones
    • [title] "Bob" [middle] Jones [suffix], [suffix]
    • Jones [suffix], [title] "Bob" [middle]
  • Maiden Name - Roberta Jones (Smith)
    • [title] Roberta [middle] Jones (Smith), [suffix]
    • Jones (Smith) [suffix], [title] Roberta [middle], [suffix]
  • Junk - John Jones (Google Docs), Jr. (Unknown)

When I first implemented this it was just to handle the junk, so I haven't thought too much about the other cases. This is helping me understand how a nickname is different.

I think when there is a nickname at the beginning of the string, i.e. may have a title but missing a first name, we basically want it to behave as if the first name slot has been filled by the nickname and then let the rest of the parse happen as normal. I don't think a maiden name will never appear without a last name, so it won't need that kind of handling. And the junk probably doesn't matter where it ends up as long as it's not filling up a name slot.

Let me know if you think that sounds right or you have any examples of things in quotes or parenthesis that are not one of those 3 types of things.

@rsimoes
Copy link
Author

rsimoes commented Nov 1, 2018

Here is where I was encountering the names in the form of [nickname last]. From comparing the list there, it appears they all conform to one of the three.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants