Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change "translation_of" to a work ID instead of a string title #412

Closed
LeadSongDog opened this issue Feb 10, 2017 · 40 comments
Closed

Change "translation_of" to a work ID instead of a string title #412

LeadSongDog opened this issue Feb 10, 2017 · 40 comments
Assignees
Labels
Affects: Data Issues that affect book/author metadata or user/account data. [managed] Affects: Librarians Issues related to features that librarians particularly need. [managed] Affects: UI Issues with the web site's user interface. [managed] metadata Module: Merging Record merging Needs: Triage This issue needs triage. The team needs to decide who should own it, what to do, by when. [managed] Type: Feature Request Issue describes a feature or enhancement we'd like to implement. [managed]

Comments

@LeadSongDog
Copy link

This may be a tough one, but it should be worthwhile in the long run. There are a tremendous number of redundant work records. Many of these are translations that have been recorded incorrectly as independent works, sometimes even under variant author spellings that need to be merged. When an edition record indicate "translation of", it should be subordinated to the original work from which it is translated. Consider all the variant "works" for the Iliad found under https://openlibrary.org/authors/OL6848355A/Homer

Here's my WAG as to what needs to happen:
The first step is to name that source work. That has been done, albeit inconsistently because it hides in the "librarian" functionality.
The second is to find if the corresponding source work identifier exists. Should it exist, that source-work record should be linked. If no source-work record exists, then the existing translation-work record should be retitled to match the "translation of" string. Ultimately, all translations of one source work should be subordinated to that one source work record. Then the translation work records should either be deleted or somehow converted to capture the translator name. Translation is after all a creative effort the product of which is independently subject to copyright.

@LeadSongDog LeadSongDog changed the title Change translation_of to a work ID instead of a string title Change "translation_of" to a work ID instead of a string title Feb 10, 2017
@LeadSongDog
Copy link
Author

LeadSongDog commented Feb 10, 2017

@#382 and #367 relate

@LeadSongDog LeadSongDog reopened this Feb 10, 2017
@LeadSongDog
Copy link
Author

LeadSongDog commented Apr 12, 2017

Related discussion thread is at http://www.mail-archive.com/ol-discuss@archive.org/msg00965.html

@mekarpeles Care to comment?

@mekarpeles
Copy link
Member

@LeadSongDog thank you for the tag! I'll try to respond to this as soon am I'm done wrapping up an ImportBot improvement (which hopefully will bring ~100k readable items onto Open Library). Likely this weekend. cc'ing @hornc as well in case he wants to weigh in.

@mekarpeles
Copy link
Member

Thanks again @LeadSongDog. There's a lot going on in this thread.

In response to the mailing list:

  1. @bfalling and I are pushing towards unifying Work and Edition pages somewhat. The URL structure will appear the same, but from the user's experience, they will always be on a Work page with the ability to change which Edition (and its data) is selected/active.
  2. I think keeping the Work titles in English is a practical idea. Any thoughts against? @hornc, @tfmorris, @dvanduzer?
  3. I'm partial to keeping the Edition titles in their native language / encoding

@dvanduzer
Copy link
Contributor

dvanduzer commented Apr 18, 2017

(edit: What is the hidden "librarian" functionality that sometimes has the link to the original translation?)

I don't think there can be a one-size-fits-all rule for translations. The closest thing to a "right" answer is probably whichever version was published first. Literature from antiquity is an important exception to this rule. Most alternate translations of Homer or Plato or The Bible include enough original scholarship that you wouldn't want to glob all of them into one work.

Strictly speaking, it isn't necessary to store any title information on a Work, because a Work is already a kludge. There will always be a canonical edition, whether we are talking about Cien Años de Soledad by García Márquez, or whether we are talking about the fourth (and most recent) edition of Intro to Calculus. Having a Work object is a convenience for developing the UI. (There's no point in trying to capture the metaphysics of whether the hardback or the paperback, or the galley copy or the author's typewriter, or which fragments of papyrus contain the essence of the work.) Another way to put it is that a Work doesn't do anything other than get us from "a page" to "one page" -- reifying a book or ebook into a Book.

This gets more complicated because it's not always clear when relations like "translation of" and "revision of" are commutative & transitive. Things like "audiobook of" and "transcription of" have still more complicated one-way relationships.

So for 2), it is certainly practical to keep the convenience of storing a canonical title under Work for now. A wishlist for the future might be to always display the title based on the user's locale, and always include the original language title, when that is different than the user's locale.

@LeadSongDog
Copy link
Author

Certainly derivative works often are substantially better than the original, but "author" is, I think, distinctly for the original. Translators, editors, and other contributors should really not be conflated. I suppose that identifying the first-published edition is a simple rule that could be readily automated. I note that for non-English entries PubMed stores both the English title (and other metadata) and the original-language vernacular title. It's a robust approach though subject to variations in anglicisation.

@hornc
Copy link
Collaborator

hornc commented Apr 26, 2017

Trying to condense down my understanding of discussions on how translations should be handled:

  1. Translated editions should be editions of the original work
  2. Any translated work record should be merged with the original work (i.e. made into a redirect and have all editions moved to the original work)

Therefore the translation_of field is not needed? Just a correctly set translated_from (Librarian Mode), and ideally the Translator role.

Here's an example I set up showing Translator under Contributors, and that the edition is written in English, translated from Ancient Greek. https://openlibrary.org/books/OL18836004M/The_Iliad

As for work titles being in English, I think that is an ok default, and I am happy with, say, classical Greek and Latin works using the common English titles. In fact I'd argue that English titles would be more generally useful than any attempt at an original Greek title in Greek alphabet. However, I don't think anyone should necessarily change an original French, Russian, Thai, or Arabic work title to English as a matter of policy. I think the rule should be that a work title is whatever is a commonly internationally recognized title for that work. If there are multiple candidates, maybe we need an "Alternative Names" section as for authors?

I propose we remove / ignore the translation_of field, and endeavor to populate the is_translation fields correctly.

If anyone wants to populate translation_of with work OLIDs temporarily to assist with performing merges, that might actually work as is using the free text field, and be helpful. Though it may just be easier to do the merges, once we make the work merge interface.

@tfmorris
Copy link
Contributor

I disagree that a Work is just a UI kludge. It has a bunch of useful properties. I'd hate to see OpenLibrary start down the slippery slope of thinking that works are expendable.

I don't think English-only is appropriate for Work titles. They should be localized. Works have different common names in different languages and OpenLibrary should accommodate that. Each localized title should be tagged with its language and then the rendering code can choose appropriate fallbacks on a per-user basis e.g. English, then French, then the rest of the Romance languages, then whatever you've got.

I don't understand the distinction between translation_of and translated_from. They sound like synonyms to me (or is one a Work property and the other an Edition property?).

Editions in all languages connected to the same Work is the correct way to go.

Freebase actually used a separate object to capture translation information. In addition to the Work-Edition link, there was also a Work-Translation-Edition path where the Translation object contained source language, target language, translator, and date of translation. You can infer some of this from Work/Edition pairs, but people often talk of a translation as a specific thing and it's often published in multiple editions. I'm not saying that this is necessarily the right way to go, but throwing it out there as another modeling alternative.

From a pragmatic data wrangling point of view, I'd have thought normalizing titles to NFC/NFKC first would make subsequent comparisons and processing much easier. http://unicode.org/reports/tr15/#Norm_Forms

@hornc
Copy link
Collaborator

hornc commented Apr 26, 2017

Sorry, the terminology isn't very clear, but
translation_of is an original work's title
translation_from is a language code

translation_of: The Illiad
translated_from: grc (Ancient Greek (to 1453))

@hornc hornc added the Module: Merging Record merging label May 8, 2017
@LeadSongDog
Copy link
Author

After bashing away at Charles Perrault for quite a while it's become apparent that the the editing of "translation of" and "original language" is an entirely wasted effort. What should happen in a well conceived system is that the editor picks one from a list of works by the same author. In this example we have hundreds of editions and dozens of translations, but few original works. Where multiple work records reflect a single work the list should show either the most-editions or the first-published (ignoring undated records).
As an aside, it has also exemplified the need for better treatment of collected works: editions will select different subsets or even assemble various authors' stories into collections. Some or selected may be first publications, as for the tales by Mme d'Aubrey lumped in with his Contes des fées in some editions.

@LeadSongDog
Copy link
Author

LeadSongDog commented Dec 3, 2018

@LeadSongDog
Copy link
Author

@cdrini That last was your work wasn't it? Care to take this bit on too?

@LeadSongDog LeadSongDog reopened this Dec 13, 2018
@seabelis seabelis added the Affects: Librarians Issues related to features that librarians particularly need. [managed] label Sep 5, 2019
@seabelis seabelis added this to Needs triage in Librarian Issues via automation Sep 5, 2019
@xayhewalo xayhewalo added this to Un-Triaged in Triage Oct 20, 2019
@LeadSongDog
Copy link
Author

LeadSongDog commented Oct 24, 2019

Perhaps the solution to this is to make this behave like the "What work is this an edition of" field: allow entry of either the work ID or the work's vernacular title, then look up the work ID's associated title. It should store, index, and display both the ID and the vernacular title.
--Revision: or just eliminate the field and use the work as intended--

@xayhewalo xayhewalo added Affects: Data Issues that affect book/author metadata or user/account data. [managed] Needs: Triage This issue needs triage. The team needs to decide who should own it, what to do, by when. [managed] State: Backlogged labels Oct 28, 2019
@tfmorris
Copy link
Contributor

original_language on the work would be a useful addition if we don't already have it. This could implicitly encode the information that @hornc proposed translated_from for.

@hornc hornc self-assigned this Nov 10, 2019
@cdrini
Copy link
Collaborator

cdrini commented Nov 11, 2019

@LeadSongDog (or someone else) would you mind giving a summary of what the current proposal is here? I'm having trouble following what the desired change is after all the discussion.

Perhaps the solution to this is to make this behave like the "What work is this an edition of" field: allow entry of either the work ID or the work's vernacular title, then look up the work ID's associated title. It should store, index, and display both the ID and the vernacular title.

Except for the fact that it doesn't display the title, this is how the "What work is this an edition of" field functions; does that mean this issue is resolved?

@xayhewalo
Copy link
Collaborator

xayhewalo commented Nov 12, 2019

@cdrini My summary of the thread (anyone else feel free to edit this comment if I missed anything).

Premable

As I typed this I realzied that this issue might need to be broken up into several issues or made an Epic

Original Issue

Many translations are their own works when really they should be editions of the original work they are translated from.

Currently there is a field translation_of that accepts free text of a work's title and a field translation_form that accepts a language code (i.e grc for Ancient Greek (to 1453)).

@LeadSongDog proposed making translation_of accept a Works olid to force translations to be associated with a work.

Subsequent Issues

  1. Sometimes the translation is considered the "canonical work" or several different translations are considered the "canonical work" in different locales. I.e Very few patrons would want to read The Odyssey in it's original Ancient Greek.

  2. Sometimes a novel will change title when published in a different country. Example: the first Harry Potter novel is titled Harry Potter and the Philosopher's Stone in the UK, but is titled Harry Potter and the Sorcer's Stone in the US. Is this a translation?

  3. Sometimes translations themselves will have multiple editions. Eek!

Proposed Solutions

  • Merge all translated works into the original work
    • Subsequently make translation_of require a works_olid/in the edit UI display works from free text and allow the user to pick a relevant work.
  • Work titles should be localized. So that searching "Communauté de l'Anneau" should produce a localized version of "The Fellowship of the Ring" works page.
  • All works should have original_language fields
  • Introduce a new Work-Translation-Object relationship similar to Freebase's implementation

@seabelis
Copy link
Collaborator

Sometimes a novel will change title when published in a different country. Example: the first Harry Potter novel is titled Harry Potter and the Philosopher's Stone in the UK, but is titled Harry Potter and the Sorcer's Stone in the US. Is this a translation?

No, not a translation. Just a marketing decision.

@LeadSongDog
Copy link
Author

Once editions are correctly linked to the original work, the "translation_of" entry would seem to be redundant. Many edition records are at present linked to no work record, or to the wrong one, so in the interim the data there has value. That said, I believe my earlier suggestion was ill-considered. The comment by @seabelis makes sense: if we show it at all, it should autopopulate from the work record, but why would we need to say the same thing twice?

@cdrini
Copy link
Collaborator

cdrini commented Nov 13, 2019

Should we close this as won't fix? I agree that storing the work record twice seems redundant.

@cdrini
Copy link
Collaborator

cdrini commented Nov 13, 2019

(There are other issues open for creating a merge flow (#805), and for making sure all editions have a work (#2629))

@seabelis
Copy link
Collaborator

If this will ultimately be solved indirectly through another issue, I think closing it is fine.

@cdrini
Copy link
Collaborator

cdrini commented Nov 14, 2019

I think it'll be indirectly solved by the two issues above.

@cdrini cdrini closed this as completed Nov 14, 2019
Triage automation moved this from Needs: Assessment to Closed Nov 14, 2019
Librarian Issues automation moved this from Needs triage to Closed Nov 14, 2019
@xayhewalo
Copy link
Collaborator

xayhewalo commented Nov 14, 2019

@cdrini #805 and #2629 don't address translations having multiple editions.

For example, The Odyssey by Homer has multiple works in different languages. I don't think merging all these editions into a single work is the best way to handle this.

I think the most robust way of handling this situation is having a Translated Work object that can have it's own editions but must be linked to an Original Work. I'm not sure how much work it'd take to implement this though.

Perhaps the easiest fix is to allow editions to be searched and filtered though a Works page.

@LeadSongDog
Copy link
Author

@guyjeangilles If we can open the door for a new object in the schema, it should be useful for more than just translations. I would consider a more general answer such as "based_on", which could take a list of work identifiers. That way it could capture anthologies, compendia, collected works, adaptations, etc, not just translations.

@xayhewalo
Copy link
Collaborator

@LeadSongDog If based_on is easy-ish to implement, this would also allow filter out associated media from search results like companion guides, spark notes, and box sets. Example: https://openlibrary.org/search?q=the+hunger+games&mode=everything

@seabelis
Copy link
Collaborator

If we follow these cataloging standards (which I have been trying to do), all translations would belong to the original work in ways adaptations, companions, etc. do not. https://www.isko.org/cyclo/work1.jpg

I'm not sure I'm fully understanding what is meant by editions of translations. The OL Work-Edition model doesn't really distinguish between the expression and manifestation; this is all captured together as 'edition'. This applies to 'editions' in the original language as well.

That said, I do see benefits to having some separation. This is tricky and not just about translation (as was discussed at length last week) or setting a user's language preference for the site. For example, I will frequently link to the Wikipedia article for a given work. I usually link to the article in the work's original language. If the work has been translated into twenty languages, there are potentially 20 links to Wikipedia alone. I do not add them because this would make for a very long list. I also don't add the links to individual edition pages because a Wikipedia article is generally not edition specific and I don't want to add this to multiple editions. This applies to other types of links as well, i.e. book reviews.

If there were a way to filter by language at the work level, I think this would be useful for the user. So 1) display only editions in a given language and 2) display work-level content in that language. This should be independent of any site-language preferences, as plenty of people are multi-lingual and may have their site content preference set to their native language but be literate in multiple languages (though some default behaviour would have to be established - maybe the user can identify their own preference, i.e. a setting to 'auto-filter works to my language setting' / 'show works in their native language').

If having a Translated Work object as @guyjeangilles suggests would achieve this or something like this, I'm all for it.
Related to #1808?

@xayhewalo
Copy link
Collaborator

@cdrini bumping this as I don't think that #805 nor #2629 will address translations with multiple editions. Perhaps we should make a separate issue?

@hornc
Copy link
Collaborator

hornc commented Nov 20, 2019

I may have lost track of some of this, since the discussion has got quite long, but I thought the final decision was to deprecate translation_of and use the existing work relationship (which is how the current system works, and there have been past efforts at scale to move translations under the correct work). This does leave translated works with multiple editions un-linked, which I thought was a great point to note, but it seems like this is a standard issue that is just accepted elsewhere too?

I think the current system can also support separating translations into their own works (A. Pope's Illiad is my example where I think that would be completely justified), but it seems we could end up with edit wars that end up merging works, then trying to split them out because it is hard to draw a clear line that will be agreed upon by everyone. "Translations all belong under the one work" is easier to work with.

We already mix different editions by different publishers under the same work, this really isn't so different. The practical problems affecting users would probably be solved if we had a decent way of filtering or locating editions by language.

@seabelis
Copy link
Collaborator

Will existing data be preserved for works that have not been consolidated?

@BrittanyBunk
Copy link
Contributor

#2601 relates here.

@LeadSongDog
Copy link
Author

@cdrini Please either reopen this issue or respond the the discussion above since you closed. I for one don’t grok the closure.

@seabelis
Copy link
Collaborator

seabelis commented Jan 28, 2021

I have run into situations where the 'translated from' title and language differ from the original work. In some cases works are translated from the English (or some other) translation rather than the original language. I guess the question in these cases are do we use the 'translation of' or 'translated from' information. The latter is usually what is specified on the verso page.

I would be happy to see the field auto-populate, in some way, from the work, but with the option to over-ride. Most of the time populating from the work will be accurate.

@BrittanyBunk
Copy link
Contributor

BrittanyBunk commented Jan 29, 2021

@seabelis Maybe I'm misinterpreting, but I would say both. I mean, if say something starts in Russian, but is translated to English, that's a 'translation of', but if it goes from the English translation to Spanish, it's a 'translation from'.

I'm thinking to keep it straight would be 'translation from' if we're referring to editions, as we could say it got translated from Russian's writing to English and from English to Spanish. As long as we say which book it's translated from, we'll be good.

'Translation of' works if you're translating from other works. However, since a translation would count as an edition, that'll never happen, so we don't need it.

@LeadSongDog
Copy link
Author

LeadSongDog commented Jan 29, 2021

@BrittanyBunk Very few translations specify the precise edition from whence they are translated. In those few cases it could be shown in the edition notes field, but we should not depend on information that can rarely be had. In any case few readers care about the chain of translation, save for a few very special works.

@BrittanyBunk
Copy link
Contributor

@LeadSongDog it's not hard to look at each book and see and sometimes the author says which - I opt for accuracy - what do you prefer?

@LeadSongDog
Copy link
Author

@BrittanyBunk ”sometimes” is the problem: we can’t build a system that depends on those exceptional cases

@BrittanyBunk
Copy link
Contributor

BrittanyBunk commented Jan 30, 2021

@LeadSongDog yes, you can - you have different options for different circumstances. It's like sometimes I have an ISBN 10 - so what are you talking about?

@LeadSongDog
Copy link
Author

LeadSongDog commented Mar 23, 2021

@BrittanyBunk Sorry, I misspoke when I said “we can’t build...”: I should have said “we would be foolish to build...”. The system should be designed to work for all cases, not just for special cases. It is wonderful to capture that information in the few cases it is available, but when the information is not available, the system should still function sensibly. Putting the information in edition notes serves this purpose.

@cdrini Bump

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Affects: Data Issues that affect book/author metadata or user/account data. [managed] Affects: Librarians Issues related to features that librarians particularly need. [managed] Affects: UI Issues with the web site's user interface. [managed] metadata Module: Merging Record merging Needs: Triage This issue needs triage. The team needs to decide who should own it, what to do, by when. [managed] Type: Feature Request Issue describes a feature or enhancement we'd like to implement. [managed]
Projects
Librarian Issues
  
Closed
Triage
  
Closed
Development

No branches or pull requests

9 participants