Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tussenvoegsels / family name prefixes #130

Open
patvdleer opened this issue Jan 31, 2022 · 12 comments
Open

Tussenvoegsels / family name prefixes #130

patvdleer opened this issue Jan 31, 2022 · 12 comments

Comments

@patvdleer
Copy link

Ticket to discuss expanding the recognition, based on; #121 (comment)

Suggestion is to add a family name with separate tussenvoegsels / family name affix/prefixes. To avoid introducing breaking changes I would suggest adding this as additional properties rather than splitting up last name.

Wiki info: https://en.wikipedia.org/wiki/Tussenvoegsel

@derek73
Copy link
Owner

derek73 commented Jan 31, 2022

I just want to clarify the problem with the current handling. Do you propose that tussenvoegsels are not conceptually part of the last name, like if someone filled out a form they would not include the tussenvoegsels in the last name field? Could you give a few specific examples of when they would or would not be considered part of the last name? We don't have them so much in English so I'm not very familiar with how they are used.

I'm guessing the answer would be similar for family name affixes in any language that has them.

https://en.wikipedia.org/wiki/List_of_family_name_affixes

patvdleer added a commit to patvdleer/python-nameparser that referenced this issue Jan 31, 2022
@patvdleer
Copy link
Author

I made a quick and crude example here although I realised just now that this approach won't work with a double/multiple last name which each have a prefix.

patvdleer added a commit to patvdleer/python-nameparser that referenced this issue Jan 31, 2022
@patvdleer
Copy link
Author

@derek73 please review my latest brain fart, I would appreciate your view

@derek73
Copy link
Owner

derek73 commented Feb 3, 2022

@patvdleer at first glance this looks like it could maybe work. I guess I wonder if we need all of it because it seems like it functions the same as prefix, just taking prefixes and separating them out from the name portion. Maybe we should do that with all prefixes and we don't need a new place for the constants to live?

Can you open a pull request? It will make it easier to see all your changes together and review them.

@patvdleer
Copy link
Author

I'm sure it can be vastly improved, this was just a somewhat quick and dirty fix since I needed to be able to generate names formatted for sorting.

https://github.com/derek73/python-nameparser/pull/132/files

@derek73
Copy link
Owner

derek73 commented Feb 7, 2022

I was able to chat with one my Dutch friends about this a bit more. She said that she considers her last name to include the tussenvoegsels and never want's to see it omitted because it's not her name if it doesn't include the tussenvoegsels. (She was unable to come up with a word for the part of the "achternaam"/last name that does not include the "tussenvoegsels", in Dutch or English.)

One important insight, however, was that she expects her last name to be sorted without consideration of the prefix/tussenvoegsels. So, for example, "de Jong" should be sorted as though it starts with a "J", not a "d". The parser cannot currently support that, and it seems a valid reason why we need an attribute that just has the root part of the last name minus any prefixes/tussenvoegsels.

Just to make things more complicated, it appears that French prefixes work a bit differently, and there's a decent amount of variation between languages.

MLA List of Works Cited: Order of Entries.

When de occurs with French names of one syllable, alphabetize under d: De Jean, Denise. Otherwise, alphabetize by last name: Maupassant, Guy de.

I wonder what we should call this new part of the last name that is what you should sort by? And definitely the parser's capitalization doesn't take any of that into account.

While I'm thinking of it, if we implement this it would be nice to do it in such a way so that someone could assign a string like "de Jong" to the lastname attribute and it would still parse out the prefix and allow you to sort by the correct part of the last name. Sometimes people enter their last name into a UI, and it would be nice if the parser could do something useful even when you pass it the separate name parts.

@patvdleer
Copy link
Author

One important insight, however, was that she expects her last name to be sorted without consideration of the prefix/tussenvoegsels. So, for example, "de Jong" should be sorted as though it starts with a "J", not a "d". The parser cannot currently support that, and it seems a valid reason why we need an attribute that just has the root part of the last name minus any prefixes/tussenvoegsels.

This is why I want this so bad, I'm trying to sort a large amount of names.

The issue I still haven't figured out is how, Leer, van der, Patrick OR Leer, Patrick van der, the latter is how Calibre handles it.
Somehow I feel I would go with the former, to create more of a distinction between Leer, Patrick and Leer, Patrick van der and this would work better with multiple last names since I have no clue how that would turn out.

So with multiple something like Vincent van Gogh von Beethoven -> Gogh, van, Beethoven, van, Vincent ?

As for the French, on the one hand I would say localization which would determine wetter or not to apply the above logic?

@derek73
Copy link
Owner

derek73 commented Feb 12, 2022

re: Leer, van der, Patrick OR Leer, Patrick van der

Both of those options seem not ideal, but I guess it could depend on the context too. If you're looking at a bunch of names in alphabetical order, Leer, Patrick van der makes it more clear what part is being sorted by and doesn't isolate "van der" into its own new orphaned thing. Probably the goal of this parser should be to provide the tools to do either, because someone might need to match Calibre or similar too.

I would think the ideal way to sort, or at least what might be most respectful of how people consider their own names but still make the sorted element clear, would be at least to enable a way to sort the names without regard to "van der" but highlight the last name, ex:

Leer, Aaron
van der Leer, Patrick
Leer, Thomas

If the parser provided a way to do that, which it doesn't today because "van der Leer" is all one name, then it seems it would be possible to also so something with the string format or other methods to get the representation you need.

We can't really use system localization because wether or not it's French depends on the name, not your system. Would be nice if you could maybe specify that it's a French name and smart things would happen, but I have no idea how to structure something like that. It would have to be a way that other people could contribute those things.

@patvdleer
Copy link
Author

patvdleer commented Feb 21, 2022

sorry for the really late reply, personal life got in the way...

So to summarize this, we don't want to get into localisation, the "tussenvoegsels" should already work or at least that was the intent of the code you wrote (referring to #132 (review) )
In other words, we have a bug on our hands? If so, do you want to fix it or should I attempt to?

@oakla
Copy link

oakla commented Jan 17, 2023

Before I say anything else, I believe the English term for Tussenvoegsels is 'Nobiliary Particle'
...whether you want to use that or the Dutch term doesn't bother me, but I'm adding it here in case anyone else tries searching for the English term

@oakla
Copy link

oakla commented Jan 17, 2023

I may be interested in this feature.

As has been pointed out above, a tussenvoegsels is usually considered to be part of a surname. However, the project I am working on requires that double barrel names be treated differently from names with tussenvoegsels.

@patvdleer
Copy link
Author

Before I say anything else, I believe the English term for Tussenvoegsels is 'Nobiliary Particle' ...whether you want to use that or the Dutch term doesn't bother me, but I'm adding it here in case anyone else tries searching for the English term

How did you figure that out?

I may be interested in this feature.

As has been pointed out above, a tussenvoegsels is usually considered to be part of a surname. However, the project I am working on requires that double barrel names be treated differently from names with tussenvoegsels.

I have a fork of an older version which does kinda do the job, keep in mind, it's been a while.

Changes: patvdleer@80d4e55#diff-03c2f987f4d661012a7843bb797f32bcbbc5126415024e3dfc7255766a95d3fe

Files:
https://github.com/patvdleer/python-nameparser/blob/80d4e5550b859a3b258e30db8f52d347ad2484c5/nameparser/parser.py#L414
https://github.com/patvdleer/python-nameparser/blob/80d4e5550b859a3b258e30db8f52d347ad2484c5/nameparser/parser.py#L733

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants