Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider names followed by a period as titles or suffixes #109

Open
tgb417 opened this issue Jun 2, 2020 · 3 comments
Open

Consider names followed by a period as titles or suffixes #109

tgb417 opened this issue Jun 2, 2020 · 3 comments

Comments

@tgb417
Copy link

tgb417 commented Jun 2, 2020

There are a couple of somewhat obscure Titles in the Titanic data set that Name Parser does not get right by itself

  • Dona.
  • Jonkheer.
  • Don.
  • Major.

In these names.

  • Uruchurtu, Don. Manuel E
  • Peuchen, Major. Arthur Godfrey
  • Butt, Major. Archibald Willingham
  • Reuchlin, Jonkheer. John George
  • Oliva y Ocana, Dona. Fermina

Love the work that NameParser does. Passing along the issue base on your request for feedback.

@tgb417 tgb417 changed the title Problems from the Titanic Data Set Problems from names in the Titanic Data Set Jun 2, 2020
@tgb417
Copy link
Author

tgb417 commented Jun 2, 2020

I'm working with version nameparser==1.0.6

@derek73
Copy link
Owner

derek73 commented Aug 9, 2020

I think the problem with these is that they are all titles that could also be first names, possibly with the exception of "Major". We do have the abbreviation "maj" but not the full title.

Ok, just did a quick search and it seems there are about 50-500 people born per year named "Major". That's surprising. Seems it became more popular in 2016. And there are women named "Major" too? wow. Ok, I guess. :)

The parser avoids including titles that could also be first names in its set of titles, because the title check happens first and that would mean that names like "Dean", although they are common titles, would require them to always be parsed as titles and fail in the much more common case of Dean as a first name.

So I think these are the kind of edge cases that there's no way to always be right with a simple rule-based approach. It's probably more upsetting for those people named "Major" to always have the parser get their name wrong than for those with the title Major to have it accidentally think that's their first name? At least the later error is a more understandable one for a computer to make? And there are other ways less ambiguous ways of formatting "Major" as title where the parser would interpret it correctly.

It is possible to adjust the parser so that all of these names would always be counted as titles. As long as Dona isn't upset by that. :)

I was going to close this as won't fix, but I guess in these examples all of these names are followed by a period. At least in these examples, we could take that as a clue and count them as titles. I wonder if we could do that as a rule, any name part that is followed by a period must be some kind of title or suffix, as long as it's longer than 1 character?

@derek73 derek73 changed the title Problems from names in the Titanic Data Set Consider names followed by a period as titles or suffixes Aug 9, 2020
@tgb417
Copy link
Author

tgb417 commented Aug 10, 2020

Thanks for reviewing these cases.

Your insight about the period is interesting, particularly for the format:

last_name, title. first_name middle_initial_or_name

This is the style of the list I'm looking at.

I decided to see if there were any style guides that I could find that included the "." period after Major. In a brief look, I did not find any style guides that suggest using Major. (with the period). Most academic / literary citation formats seem to prefer leaving out titles. Looking at the modern US army style guides for things like wedding invitations if the word is fully spelled out they do not seem to put a period. It is not clear how last_name first "guest lists" should work.

For my little project, I can just change Major. to Maj. and things will work with this library.

Thank you for a super helpful library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants