Tussenvoegsels / family name prefixes support #132

patvdleer · 2022-02-03T08:00:52Z

No description provided.

derek73

I took a closer look and made a few comments. Generally I think it makes sense to provide what you're after, but I think it's kinda the same thing as prefixes and we should combine your new constants with prefixes and then provide a way to get both the name with prefix (existing), name without prefixes, and just the prefixes without the last name, all as individual separate attributes.

derek73 · 2022-02-04T02:27:27Z

nameparser/config/affixes.py

+    'da',
+    'das',
+    'de',
+    'degli',


Many of these affixes are already defined in prefixes.py.

It seems like mainly what you want is the parts that come before a last name and currently get added to it (wether we call them prefixes or affixes or tussenvoegsels) separately from the last name.

derek73 · 2022-02-04T02:34:44Z

tests.py

+        hn = HumanName("Vincent van der Gogh")
+        self.m(hn.family, "van der Gogh", hn)
+        self.assertEqual(hn.family_list, [
+            [["van", "der"], ["Gogh"]],


I think it makes sense to have these pieces in a separate attribute, but I'd lean more towards having separate attributes for those 2 nested lists. Maybe "affixes" and "family name", which combine together to give "last name", (which, btw, combines with middle names to give "surnames")?

We are kind of running out of names for things. And Wikipedia thinks that Surname, Last name, and family name are all the same thing, so there's that. 😄

Also, Google translate says the English translation of tussenvoegsels is "infixes". Really? There's yet another one? I did not know that was a word. (It's a good day when I can learn a new English word, so thanks!) But I guess if we're desperate, we could use that.

Kinda makes me wonder if we could do something more useful with slices, maybe something like each bucket gets an index value and you can slice off the parts you want/don't want?

[0:titles, 1:first, 2:middle, 3:prefixes, 4:last, 5:suffixes]

Actually, that is already how slice works. Been so long since I implemented it that I forgot. It returns the concatenated string of whichever members you slice.

So, if we added prefixes in there you could get just the last name with a slice, ex: hn[4:4].

_members = ['title', 'first', 'middle', 'prefix', 'last', 'suffix', 'nickname']

Mostly what we need is just to have the prefixes in their own bucket, then we can be more flexible when we display things. Probably means changing the parse tree to add prefix in there.

The more I dive into the surname/last name/family name, the more confused I get... There doesn't seem to be a clear definition of any of these things.

I am missing some entries in the prefixes hence the reason I made a separate list not to interfere with your work. I was looking into possible locales but that seems to be a road I really don't want to go down...

Mostly what we need is just to have the prefixes in their own bucket, then we can be more flexible when we display things. Probably means changing the parse tree to add prefix in there.

I tried Vincent van Gogh van Beethoven which gave me van Gogh van as a middle name. I dropped that attempt for now but that is the reason I created a nested list with pairs, prefix - family name. Otherwise it would give me simply van twice, and (possibly) format it as van van Gogh Beethoven.

There used to be similar conversations re: "title", and "suffix" because people got tripped up on the semantic meaning of a title vs a suffix, ex "Dr" and "MD" are kinda the same but they appear in different places in the name. I think the best strategy for the parser is to focus on the position of words (name parts) in the string, and special words that join with words before/after/around them in certain conditions/parts of the name. In the future I should probably choose names that focus on that positional information and avoid semantic meaning.

We can add some of those things from your affixes constants to prefixes. The only danger is ones that could also be first names, like "fitz, "mala" and "ned". I think the current handling for prefixes skips the first name though, so it could be fine to include those.

I tried Vincent van Gogh van Beethoven which gave me van Gogh van as a middle name

👎 That's annoying. Doesn't seem like that was my intention:

python-nameparser/nameparser/parser.py

Lines 879 to 903 in 3efe171

# join everything after the prefix until the next prefix or suffix

try:

if i == 0 and total_length >= 1:

# If it's the first piece and there are more than 1 rootnames, assume it's a first name

continue

next_prefix = next(iter(filter(self.is_prefix, pieces[i + 1:])))

j = pieces.index(next_prefix)

if j == i + 1:

# if there are two prefixes in sequence, join to the following piece

j += 1

new_piece = ' '.join(pieces[i:j])

pieces = pieces[:i] + [new_piece] + pieces[j:]

except StopIteration:

try:

# if there are no more prefixes, look for a suffix to stop at

stop_at = next(iter(filter(self.is_suffix, pieces[i + 1:])))

j = pieces.index(stop_at)

new_piece = ' '.join(pieces[i:j])

pieces = pieces[:i] + [new_piece] + pieces[j:]

except StopIteration:

# if there were no suffixes, nothing to stop at so join all

# remaining pieces

new_piece = ' '.join(pieces[i:])

pieces = pieces[:i] + [new_piece]

patvdleer added 3 commits January 31, 2022 11:45

Refs derek73#130 - tussenvoegsels

8a98c16

Refs derek73#130 - tussenvoegsels/affix rather than family prefix

80d4e55

Merge branch 'derek73:master' into master

d942442

derek73 reviewed Feb 4, 2022

View reviewed changes

derek73 added the enhancement label Feb 4, 2022

derek73 changed the title ~~refs #130~~ Tussenvoegsels / family name prefixes support Feb 4, 2022

patvdleer mentioned this pull request Feb 21, 2022

Tussenvoegsels / family name prefixes #130

Open

Added 'den' to family affixes

5761556

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tussenvoegsels / family name prefixes support #132

Tussenvoegsels / family name prefixes support #132

patvdleer commented Feb 3, 2022

derek73 left a comment

derek73 Feb 4, 2022

derek73 Feb 4, 2022

derek73 Feb 4, 2022

derek73 Feb 4, 2022 •

edited

derek73 Feb 4, 2022

patvdleer Feb 6, 2022

derek73 Feb 6, 2022 •

edited

	# join everything after the prefix until the next prefix or suffix

	try:
	if i == 0 and total_length >= 1:
	# If it's the first piece and there are more than 1 rootnames, assume it's a first name
	continue
	next_prefix = next(iter(filter(self.is_prefix, pieces[i + 1:])))
	j = pieces.index(next_prefix)
	if j == i + 1:
	# if there are two prefixes in sequence, join to the following piece
	j += 1
	new_piece = ' '.join(pieces[i:j])
	pieces = pieces[:i] + [new_piece] + pieces[j:]
	except StopIteration:
	try:
	# if there are no more prefixes, look for a suffix to stop at
	stop_at = next(iter(filter(self.is_suffix, pieces[i + 1:])))
	j = pieces.index(stop_at)
	new_piece = ' '.join(pieces[i:j])
	pieces = pieces[:i] + [new_piece] + pieces[j:]
	except StopIteration:
	# if there were no suffixes, nothing to stop at so join all
	# remaining pieces
	new_piece = ' '.join(pieces[i:])
	pieces = pieces[:i] + [new_piece]

Tussenvoegsels / family name prefixes support #132

Are you sure you want to change the base?

Tussenvoegsels / family name prefixes support #132

Conversation

patvdleer commented Feb 3, 2022

derek73 left a comment

Choose a reason for hiding this comment

derek73 Feb 4, 2022

Choose a reason for hiding this comment

derek73 Feb 4, 2022

Choose a reason for hiding this comment

derek73 Feb 4, 2022

Choose a reason for hiding this comment

derek73 Feb 4, 2022 • edited

Choose a reason for hiding this comment

derek73 Feb 4, 2022

Choose a reason for hiding this comment

patvdleer Feb 6, 2022

Choose a reason for hiding this comment

derek73 Feb 6, 2022 • edited

Choose a reason for hiding this comment

derek73 Feb 4, 2022 •

edited

derek73 Feb 6, 2022 •

edited