Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mappings RFC #709

Open
NuckChorris opened this issue Feb 4, 2016 · 0 comments
Open

Mappings RFC #709

NuckChorris opened this issue Feb 4, 2016 · 0 comments

Comments

@NuckChorris
Copy link
Member

Okay, so I've long been thinking of how to best model mappings between Hummingbird and external website's IDs. The following is pretty much a brain-dump of what I've reasoned out so far, because I want to get it in text and in front of other people.

Background

A lot of clients need or want to map data between MyAnimeList and Hummingbird. For the most part, our database is lined up with MAL's, but this has a number of downsides, not least of which being that we really can't control our own database. What if we wanted to merge/split shows differently from MAL?

The goal of this RFC is to decouple our database from external influences, but still make it easy to remap data from their DB back onto ours (or vice versa). For most cases, an automatic direct mapping by matching title will suffice, but in some cases we need to be explicit because our titles differ, especially for sites without MAL heritage (older listing sites as well as streaming sites).

Mapping Schema

  • external_site (String, required) "myanimelist/manga" The identifier for the external site. In sites which have namespaces for their IDs, this should contain the namespace as well.
  • external_id (String, required) "3412" The external identifier or key.
  • external_units (Range, optional) 1..6 The episodes/chapters on the external site which we wish to explicitly map.
  • media (Media, required) "anime", 2348 - a polymorphic association with Media to map to
  • internal_units (Range, optional) 7..12 The episodes/chapters on Hummingbird which we wish to explicitly map to

Where external_units or internal_units is nil, it should be treated as representing the entire media's length.

We would want to validate uniqueness of site+namespace+id+units, where units don't overlap. This can be achieved in Postgres with a GIST index using the btree_gist extension. We would also want to make certain that external_units.length == internal_units.length, otherwise the mappings would get really zany.

An easy way to understand this is with the following:

(internal_type, internal_id, internal_units) <=> (external_site, external_id, external_units)

That is, the Mappings table represents a bidirectional mapping of two ranges.

Model Interface

Mapping.lookup(site: 'myanimelist/manga', id: '12345', units: 1..27)
# => [{:media => #<Manga ...>, :units => 1..6}, {:media => #<Manga ...>, :units => 7..12}, ...]`
Mapping.lookup(site: 'myanimelist/manga', id: '54321', units: 1..12)
# => [{:media => #<Manga ...>, :units => 13..25}]
Mapping.reverse_lookup(site: 'myanimelist/manga', media: #<Manga ...>, units: 13..25)
# => [{:site => 'myanimelist/manga', :id => '54321', units: 1..12}]

Basically, you pass in the foreign keys and a range (which will often be 1..something), and it returns an array of media, units tuples representing what was watched and how much of it was watched.

Examples

To understand how this would work, an example is helpful. For starters, let's imagine that we decided to merge Durarara!! x2's multiple series into one multi-season series. The mappings would look something like this:

For brevity, we omit {external_site: 'myanimelist', external_namespace: 'anime'}

[
  {external_id: '3412', external_units: Infinity..Infinity, media: #<Anime 'Durarara!! x2'>, target_units: 1..12},
  {external_id: '4412', external_units: Infinity..Infinity, media: #<Anime 'Durarara!! x2'>, target_units: 13..24},
  {external_id: '5412', external_units: Infinity..Infinity, media: #<Anime 'Durarara!! x2'>, target_units: 25..36}
]

Now let's imagine that we decided to split something. I'm gonna go with 5cm/s for this example:

[
  {external_id: '2205', external_units: 1..1, media: #<Anime '5cm/s: Part 1'>, target_units: 1..1},
  {external_id: '2205', external_units: 2..2, media: #<Anime '5cm/s: Part 2', target_units: 1..1},
  {external_id: '2205', external_units: 3..3, media: #<Anime '5cm/s: Part 3', target_units: 1..1}
]

Mapping.lookup Algorithm

Mappings.lookup would take an external range and turn it into an array of N internal ranges. This sounds difficult, but is really quite simple. First, intersect all the ranges. Then, map them onto the target range.

class Mappings
  def self.lookup(site:, id:, units:, namespace: nil)
    # Get all mappings which overlap with the external key on the listed units
    rows = where(external_site: site, external_namespace: namespace, external_id: id)
          .where('external_range && ?', [units]).order('lower(external_range) ASC').all
    # Intersect them and turn them into a more manageable hash format
    mappings = rows.map do |row|
      {media: row.media, units: row.external_range & units, mapping: row}
    end
    # Map to the target range
    mappings.map do |m|
      {media: m[:media], units: m[:units].remap(m[:mapping].external_range => m[:mapping].internal_range)}
    end
  end
end

Mapping.reverse_lookup would do the same but in the other direction. The easiest way to reason about this is that it's a direct 1:1 mapping between sets of episodes/chapters on other sites and Hummingbird.

On conversion to contiguous libraries

We can't handle watching non-contiguous ranges of episodes in libraries! That's a problem for MAL imports, isn't it? Luckily we can just do ranges.map { |r| r.length }.reduce(:+) and magically we know how many they've seen.

To convert back from a contiguous library entry to a noncontiguous set of episodes, we have to iterate through each mapping.


All of this is a longer-term thought, and in the short term I'll probably create a simpler Mappings table with only direct 1:1 mapping, just so we can remove myanimelist_id, ann_id, tvdb_series_id, and tvdb_season_id from our anime table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant