Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for known suffixes when processing nicknames #111

Open
aikimark opened this issue Jul 22, 2020 · 2 comments
Open

Check for known suffixes when processing nicknames #111

aikimark opened this issue Jul 22, 2020 · 2 comments

Comments

@aikimark
Copy link

In my names, I have 'complicated' text. There are situations where there might be both a delimited nickname and a delimited suffix. The suffix text is being added to the nickname.

Since you have known suffixes, I would hope it easy to check the parsed nickname against the known suffix list, adding the text to suffixes. I looked closely at this part of the code yet.

Another possible logic addition might be to check the location of the delimited nickname and add any additional (beyond the first) items to the suffix.

I would rather discuss this idea with you before looking at the code. You might know, in advance, that I'll break some important functionality.

@derek73
Copy link
Owner

derek73 commented Aug 9, 2020

Currently the way the parser identifies nicknames is to strip out anything that is inside of parentheses or double quotes and stick it in the nickname bucket before parsing the name string. If I understand correctly, you have instances where a suffix is included in parentheses or double quotes? Just curious, could you provide an example or two?

It does not seem like it would be too hard to check if things inside parentheses or double quotes are in the suffix list then add them to that bucket instead if they are.

@aikimark
Copy link
Author

aikimark commented Aug 9, 2020

Here are two names where a parenthesized nickname is part/whole of the suffix:

Andrew Perkins, Jr., Col. (Ret)
Lon (Jr.) Williams

Here are two names where the nickname is most likely a nickname.

JEFFREY (JD) BRICKEN
JEFFREY D 'JD' KEISTER

Here are two names where multiple delimiters are used. The double apostrophe was probably used in place of a quote character. Since this data was imported from CSV, I assume that the CSV format output process, or some process/software upstream of that, prevents quote characters inside fields.

WILLIAM A (''DREW'') MARSH III
S.E. ''ED'' WHITE

I also have a mixture of "MD", "M.D.", "M.D", "J.D.", "J.D", and "JD" titles, that should parse correctly. I haven't checked whether your patterns cover these variants. I thought, incorrectly, that some of these were delimited in such a way to be interpreted as nicknames.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants