Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author edge cases #12

Open
3 of 5 tasks
djdembeck opened this issue Oct 4, 2021 · 7 comments
Open
3 of 5 tasks

Author edge cases #12

djdembeck opened this issue Oct 4, 2021 · 7 comments

Comments

@seanap
Copy link

seanap commented Oct 6, 2021

I have a few authors that have come in to plex with honorific prefix titles. The most recent one that I came across that did not match was:

  • Professor , and Prof.

    • ex. "Professor Bryan Sykes" in Plex, but "Bryan Sykes" Audible main profile
    • ex. "Professor Deborah Tannen" in Plex, but "Deborah Tannen" Audible main profile.
  • Author Listed on book as different name than main profile

    • "Robert A. Heinlein" listed, "Robert Heinlein" in main profile
  • Multiple Authors separated by a comma do not match, but if I delete the comma and second author it matches the first author. I don't know how to handle these other than blindly relying on Audible to put the prominent author first and always taking the first author (I've seen some cases where this isn't desirable).

  • Non-person authors are sorted following [last, first]

    • ex. A book by "New Science" is matched but sorted as "Science, New"

@djdembeck djdembeck transferred this issue from laxamentumtech/audnexus Oct 6, 2021
@djdembeck
Copy link
Owner Author

  • The special title issue is an interesting one. For title, we can have a regex for 'Dr.' 'Prof' and so on, but as a full word, it may have to be hardcoded. For my own further reference, this regex catches prefixed titles:

"^.*?(?=\b\w+ )|,[\s\w]*$"

  • I tried messing around with FlexSearch to get it to find Robert Heinlein as a match, but no combination did that, which is very odd because I've seen other different spellings still match.

  • Multiple authors are already handled: e5d70dd

  • There's really no good way to handle non-person names. That could be hardcoded perhaps.

@seanap
Copy link

seanap commented Oct 13, 2021

Author listed on book as a different name than main profile.

  • I have an author tagged as "K.A. Finn" and is being incorrectly matched to "Katie Finn", when it should be "KA Finn"

@djdembeck
Copy link
Owner Author

@seanap I resolved that case in my latest commit, as well as Prof./Professor/Dr.:
5869c7e

I'm curious - have you ever seen a profile that uses a middle name? Such as before, Robert A Heinlein? I'm curious if it would be better to always remove a single middle initial, and let the search engine handle the score being lower for being 1 letter off. I'm thinking the less specific the user search, maybe the better?

@seanap
Copy link

seanap commented Oct 13, 2021

Sweet!

I have seen cases where two different people have the same first and last name, so they use a middle initial to differentiate. I think by removing the middle initial you run the risk of a false match, which wouldn't be immediately obvious.

@djdembeck
Copy link
Owner Author

I'm curious if it would be better to always remove middle to RUN the search, and then let it do whatever during scoring. Reason being audible seems to do all of these:

  • First B. Last
  • A B Last
  • A.B. Last
  • A. B. Last
  • AB Last

And it looks like search will always perform better when the letters are separated.

@seanap
Copy link

seanap commented Oct 14, 2021

As an example, "A. B. Last" the agent will search for "A Last" to get a list of results, then comparing and scoring the results with "A. B. Last"? That sounds like the best way to do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants