Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better Data Formatting #544

Open
irfan-dahir opened this issue Feb 1, 2024 · 1 comment
Open

Better Data Formatting #544

irfan-dahir opened this issue Feb 1, 2024 · 1 comment
Labels
discussion enhancement refactoring/speeding up schema
Milestone

Comments

@irfan-dahir
Copy link
Collaborator

This is a collection of points based on the feedback received over time and my general thoughts. Better formatting of returned data can help improve developer experience.

Most of these would be destructive and non-backward compatible changes to the response schema so we'd have to target major parser and REST API versions.

Enumification of known values

Some properties are returned from MAL as-is currently but they are limited and known values. It would be a good idea to create constants for them at parser-level. This would make validation easier to handle on both parser and REST API.

Some properties that come to mind from Anime/Manga type resources are:

  1. type
  2. source
  3. status
  4. rating

Note: These can be nullable.
Note2: These can have their own listing endpoints like Anime Genres in case users might not want to hard code these values and keep them dynamic client-side.

Duration to Seconds

Currently, duration is returned as a string, we should convert them to seconds so devs can format them easily.
Feedback received here: https://discord.com/channels/460491088004907029/462992340718583814/1102217962577997964

Proposed Schema:

"duration": {
    "seconds": 5400,
    "string": "1h 30m"
}

Date Props to not estimate

Related issue: #486
Currently, if the date range receives something like "2024", it will assume the starting date is "1 January, 2024". It would be better to keep those unknown prop values as null.

Current Schema:

    "aired": {
      "from": "2024-01-01T00:00:00+00:00",
      "to": null,
      "prop": {
        "from": {
          "day": 1,
          "month": 1,
          "year": 2024
        },
        "to": {
          "day": null,
          "month": null,
          "year": null
        }
      },
      "string": "2024 to ?"
    },

Proposed Schema:

    "aired": {
      "from": "2024-01-01T00:00:00+00:00", // we "estimate" here
      "to": null,
      "prop": {
        "from": {
          "day": null, // returns null here
          "month": null, // returns null here
          "year": 2024
        },
        "to": {
          "day": null,
          "month": null,
          "year": null
        }
      },
      "string": "2024 to ?"
    },

Opening/Ending Themes

Currently, we're just returning array of strings. We're not doing any further parsing at the moment. But, there is some metadata in there that we can parse and return separately. Like the episode range those OP/EDs were played in.
Related issue: #534

Current Schema:

      "openings": [
        "1: \"We Are! (ウィーアー!)\" by Hiroshi Kitadani (きただにひろし) (eps 1-47,1000, 1089-)",
      ]

Proposed Schema:

      "openings": [
            {
                  "titles": [
                        {
                              "type": "English",
                              "title": "We Are!",
                        },
                        {
                              "type": "Japanese",
                              "title": "ウィーアー!",
                        },
                  ],
                  "author": {
                        "name": [
                                  {
                                        "type": "English",
                                        "title": "Hiroshi Kitadani",
                                  },
                                  {
                                        "type": "Japanese",
                                        "title": "きただにひろし",
                                  },
                        ]
                  },
                  "episodes": ["1-47", "1000", "1089-"]
            }
      ]

The episodes data is provided in 3 different types:

  1. Ranges (e.g 1-47 "Throughout episode 1 till 47")
  2. Specific episode mentions (e.g 1000)
  3. Ongoing range (e.g 1089- "Episode 1089 and onwards")

Furthermore, as a object we can link some additional data that is now returned as well. Like preview URLs for these OP/ED themes: #534 and (if any) attached music videos.
image
image

Returning null on placeholder URLs

Related issue: #488


cc: @pushrbx

What else is there? If anyone else has any suggestions, let's discuss it below.

@irfan-dahir irfan-dahir added enhancement refactoring/speeding up discussion schema labels Feb 1, 2024
@irfan-dahir irfan-dahir added this to the 5.0.0 milestone Feb 1, 2024
@irfan-dahir irfan-dahir pinned this issue Feb 1, 2024
@rizzzigit
Copy link

I'm curious about the discussion of representing all array data in a form of annotated arrays. Like this:

{
  "props": [ "mal_id", "title", "score", "episodes", "year" ],
  "data": [
    [ 1, "Cowboy Bebop", 8.75, 26, 1998 ],
    [ 5, "Cowboy Bebop: Tengoku no Tobira", 8.38, 1, null ],
    [ 6, "Trigun", 8.22, 26, 1998 ],
    [ 7, "Witch Hunter Robin", 7.24, 26, 2002 ],
    [ 8, "Bouken Ou Beet", 6.93, 52, 2004 ]
  ]
}

The benefit I could think of is it reduces the bandwidth usage since all entries are homogenous, this means property names do not need to repeat. I haven't done research on the computational efficiency of output generation compared to the current schema, but accessing it could theoretically be faster than key-value pairs.

In terms of accessing data, Users can define a constant to find the index of the property before iterating through the data. Assuming that the spelling is correct and it's defined in the documentation, finding the index of the property should not fail or return -1.

const result = await fetch("http://api.jikan.moe/anime").then((response) => response.json())

const titleIndex = result.props.indexOf('title')
const idIndex = result.props.indexOf('mal_id')

for (let i = 0; i < result.length; i++) {
    console.log(`MAL ID: ${result[i][idIndex]}`)
    console.log(`Title: ${result[i][titleIndex]}`)
}

If this schema is paired with the ability to specify only parts of the data in the specified order, I think they should also be fine accessing the data.
For instance, if the user requests /anime?props=title,title_jp,mal_id the API should return something like this:

{
  "props": [ "title",  "title_jp", "mal_id" ],
  "data": [
    [ "Cowboy Bebop: Tengoku no Tobira", "カウボーイビバップ 天国の扉", 5 ]
  ]
}

Users can assume the positions of each property based on their request, just like this:

const result = await fetch("http://api.jikan.moe/anime?props=title,title_jp,mal_id").then((response) => response.json())

for (let i = 0; i < result.data.length; i++) {
    console.log(`MAL ID: ${result.data[i][2]}`)
    console.log(`Title: ${result.data[i][0]}`)
    console.log(`Title (JP): ${result.data[i][1]}`)
}

That's all I have for now. Any feedback on this are very much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion enhancement refactoring/speeding up schema
Projects
None yet
Development

No branches or pull requests

2 participants