Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renditions List proposal #11

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

gkatsev
Copy link
Member

@gkatsev gkatsev commented Apr 14, 2023

Closes #1

@gkatsev
Copy link
Member Author

gkatsev commented Apr 14, 2023

just realized this is missing recommendations on how it should get populated from hls and dash content, like the other proposals do.

readonly attribute unsigned long height;
readonly attribute unsigned long bitrate;
readonly attribute unsigned long frameRate;
readonly attribute unsigned long codec;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be only the video codec? How to handle differing audio tracks for otherwise similar renditions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, this should be a DOMString or something more relevant rather than a long.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The easier path forward would be to simply omit any audio codec. The more robust would be to identify whether or not the audio track (in the container sense) is in-mux with the video track (in the container sense). If yes, we list both. If no, (e.g. with AUDIO groups for HLS or separate audio AdaptationSets for MPEG-DASH) we don't include the audio codec. I'm cool with either approach here.

Related: Maybe this should be the mime codec instead? That would make it consistent with: https://developer.mozilla.org/en-US/docs/Web/API/MediaSource/addSourceBuffer#parameters & https://developer.mozilla.org/en-US/docs/Web/HTML/Element/source#attributes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, maybe it should be type and match the source element type attribute and the addSourceBuffer param. Consistency is a plus, though, it does make it a bit more annoying since then users would need to parse out the codecs from the type.

Copy link
Collaborator

@cjpillsbury cjpillsbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still planning on a more thorough review of this. Tl;dr - broad strokes this is 👌 and everything else is getting into the weeds for final approval (copy edit, clear definitions, implementation recommendations for HLS/DASH, etc. etc.)

@@ -0,0 +1,172 @@
- Feature Name: renditions_list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick(blocking): We've been using "human readable" names here. Consider:

Suggested change
- Feature Name: renditions_list
- Feature Name: Renditions List

readonly attribute unsigned long height;
readonly attribute unsigned long bitrate;
readonly attribute unsigned long frameRate;
readonly attribute unsigned long codec;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The easier path forward would be to simply omit any audio codec. The more robust would be to identify whether or not the audio track (in the container sense) is in-mux with the video track (in the container sense). If yes, we list both. If no, (e.g. with AUDIO groups for HLS or separate audio AdaptationSets for MPEG-DASH) we don't include the audio codec. I'm cool with either approach here.

Related: Maybe this should be the mime codec instead? That would make it consistent with: https://developer.mozilla.org/en-US/docs/Web/API/MediaSource/addSourceBuffer#parameters & https://developer.mozilla.org/en-US/docs/Web/HTML/Element/source#attributes

@luwes
Copy link

luwes commented May 23, 2023

sorry for the late review, I'd scanned it over before but this all looks great to me! don't see any issues to not start development on this

readonly attribute unsigned long bitrate;
readonly attribute unsigned long frameRate;
readonly attribute unsigned long codec;
attribute boolean enabled;
Copy link

@luwes luwes Jun 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes more sense to be in line with the track API and be named selected.

enabled is used in the AudioTrack where multiple audio tracks can be enabled.
selected is used when only 1 track can be selected like a VideoTrack.

when a rendition is chosen for a video track, the old rendition is deselected hence...
the name selected

scratch that, enabled makes sense now. if a manual rendition is selected all others will be disabled.
so by default most will be enabled in the typical use case.

but we'll want to know which one the ABR algo has activated if all renditions are enabled.
we might need both enabled and selected or some property that represents the active state

Copy link
Member

@heff heff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great v1 of this @gkatsev. 👍

I'm interested to debate the need for enabling/disabling video tracks in this API. See related comment.

After seeing @luwes's implementation of this, the renditions attached to tracks leads to some pretty complex stacking of event listeners, first listening for track changes then rendition changes on top of that. I want to explore surfacing videoEl.videoRenditions more. I see the issue of potentially creating two paths to the same concept, but I worry that the complexity of the current version might actually keep it from getting adopted more widely.

I sketched up this after a conversation with Wes, as what would feel like an easier world. Essentially videoRenditions would only list the renditions of the currently selected video track, and would not be configurable.

Building a quality selector menu

const renditionList = video.videoRenditions;

renditionList.addEventListener('addrendition', () => {
  // ...empty <option> list to rebuild it? 
  // Reordering will probably be needed
  // when adding/removing renditions...

  // ...add "auto" <option>...
  if (renditionList.selectedIndex == -1) {
    // ...display "auto" <option> as selected...
  }

  const options = Array.prototype.sort.map(renditionList, (rendition, index) => {
    // ...build <option> ...
    option.renditionHeight = rendition.height
    option.value = index
    if (index == renditionList.selectedIndex) {
      option.selected = true;
    }
  });

  const sortedOptions = Array.prototype.sort.call(options, (a, b) => {
    a.renditionHeight > b.renditionHeight
  });

  // ...add options to DOM...
});

Selecting a specific rendition

video.videoRenditions.selectedIndex = option.value;


The main addition in this proposal is a `VideoRenditionList` which augments the [`VideoTrack`](https://html.spec.whatwg.org/multipage/media.html#videotrack) object API. This `VideoRenditionList` is a list of `VideoRendition`s which include the minimum set of information that's request by HAS formats. In addition, it includes a way to turn each individual rendition on and off.

For example, if I wanted to limit a video to only play back SD content, I can loop through the `VideoRenditionList`, and enable only the renditions that have a height less than 720.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this solving a specific problem related to building the UI? It adds some complexity to the API, both the UI side and for player developers building support for this. This API would essentially dictate how player developers expose configurability of adaptivity, and I'm not sure we should do that. For example <hls-video> could alternatively do <hls-video max-resolution-height="720">, which I'd expect to limit the rendition set before it shows up in the renditionList. That may not be exactly how it's done, but it makes me want to keep this API more "read only" when it comes to what renditions are available.

Reading further into your notes below, it sounds like HLS.js wouldn't support this out of the box?

It is only possible to set a level cap

@heff
Copy link
Member

heff commented Jul 13, 2023

I've had a few conversations IRL on this, including philosophical ones about this project's purpose in general. Starting with the tl;dr:

  • Let's raise the prop level to videoEl.videoRenditions, instead of building on videoTracks. And have that list represent only the current videoTrack's available renditions.
  • Let's not introduce any changes to videoTracks, at least not yet
  • Let's not introduce an API to enabling/disabling tracks as options in the adaptive algorithm, at least not yet
  • Let's not introduce an API to the "active" rendition (currently playing vs. selected by user), at least not yet
  • Let's not introduce an API for externally adding/removing tracks. It should be a read-only list, at least to start.
  • Everything else stays the same (selectedIndex, onaddrendition, etc.) - @luwes tell me if I missed anything

The result then being a simple videoEl.videoRenditions list, where selectedIndex allows you to control which rendition is selected (or -1 for auto), and events that allow you to know when the renditions have changed and update the select menu appropriately.


On to the broader conversations...

There's one related topic about the sometimes competing goals of "solve APIs only for UI purposes (intentionally avoiding the deeper complexities of video)" and "create APIs that the W3C might some day adopt". Because sometimes the video element APIs support more than what the UI might require.

In this case, building on videoTracks is intuitive because of the natural data structure of video tracks and renditions, however quality selection UIs don't need that level of complexity. The UI doesn't need to know the available renditions of a non-active video track, or even what track they belong to. It just needs to know what renditions the viewer should be able to choose from, and what's currently selected. If it made no difference to the design, great! But the stacking events/listeners on videoTracks and videoRenditions makes the existing shape complicated to build on and implement.

After this conversation I think the W3C adoption of these APIs still needs to be a consideration, but only a secondary concern. Otherwise it can distract from the UI problems we're specifically trying to solve for, and open the APIs back up to video technology complexity we're trying to avoid. Ultimately meaning these specs take a lot longer to ship, and [I think] puts them at risk of not actually being adopted by other video player projects.

The other related topic was about where in the stack this API lives. Is the list a direct representation of the renditions available in the manifest, or does the list only represent what renditions are available after configuration decisions have been applied. i.e. is this API seeing what HLS.js see and in a place to tell HLS.js what renditions it should choose from, or is HLS.js telling this API what to list after it's imposed a resolution cap (for example). This is related to the feature of enabling/disabling renditions within the list, changing if a rendition can be chosen.

This is another situation where the UI doesn't need the feature of enabling/disabling tracks, so it's going deeper than the first goal and only there for the second goal. The main issue I see with this feature is it's imposing a configuration interface on players that they may not be able to support easily. For that reason alone I think we shouldn't include the feature in the first version of this spec at least. But I also think configuration of which renditions are available should just be up to players to provide.

Feel free to push back on any of that. But the general conclusions I'm coming to are:

  • We really need to focus on UI problems specifically, and sometimes that will mean abandoning our goal of W3C adoption. If we don't, I think shipping these specs will [continue to] take forever, and there will be real risk that no other players actually adopt them because of complexity.
  • We could do a better job of documenting the UI problems we're specifically solving for in these specs, to constrain the design and keep conversations focused.

@mihar-22
Copy link

mihar-22 commented Jul 14, 2023

EDIT: Apologies, I know this isn't the place for this conversation. Happy to delete and move where appropriate.

Steve: We really need to focus on UI problems specifically, and sometimes that will mean abandoning our goal of W3C adoption. If we don't, I think shipping these specs will [continue to] take forever, and there will be real risk that no other players actually adopt them because of complexity.

I'm completely with you on this one Steve. W3C adoption is terribly slow (with good reason) but it gets in the way of innovation and exploration. I also think it's hard to standardize APIs we haven't explored pratically, things we just won't know without real world experimentation and user feedback.

A simple process in my mind would be come up with a list of media player UI/UX requirements which we can have conversations around, design API's solely around that with a backing spec around the intricate/important video details, and then ship solutions to these problems in some package that we can all install and start using (e.g., npm i media-extensions).

import { VideoRendtionList, ... } from 'media-extensions'

Note: Ideally the API should speak for itself and I think this will dramatically help with adoption. Do note, preferably any API shipped is based on contracts and doesn't require the HTMLMediaElement itself.

You're also on point about the goal difference. At the UI-layer we just want what gets the job done and leads to best UX, don't care about intricate details at that level. Extremely likely we won't expose the same primitives to our users. Browser APIs are generally shaped in the form of authoring, and as library authors we adapt them for consuming. There's always a gap. If we can just figure out what library authors need to build great UI/UX first, ship the code to start experimenting, we can then work backwards to primtives that might make sense internally. I think W3C should be the last step in the proposal process/stages.

Steve: Otherwise it can distract from the UI problems we're specifically trying to solve for, and open the APIs back up to video technology complexity we're trying to avoid. Ultimately meaning these specs take a lot longer to ship, and [I think] puts them at risk of not actually being adopted by other video player projects.

Agreed. My personal feedback is technical video conversations/specs are amazing and clearly required as they form the basis on why certain higher-level API decisions were made, but ultimately the main deliverable should be simplified and answer, "what player UI/UX challenge is this solving and how?" Preferably with code, docs, and examples. Priority is getting something we can start experimenting with out the door fast. I'm personally okay with mistakes, semver and proposal stages can iron that out.

Steve: We could do a better job of documenting the UI problems we're specifically solving for in these specs, to constrain the design and keep conversations focused.

Agreed. Top-level conversations should be centered around UI/UX. Easier for people to chime in and easier/faster adoption.

@heff
Copy link
Member

heff commented Jul 18, 2023

Thanks for weighing in @mihar-22! And good reframing of the situation. I like the idea of W3C adoption being the last step, with real world implementation coming first, rather than projecting ahead.

We have a PR in media chrome exploring an implementation of this. Want to try it out in Vidstack and see if it works in that context?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Playback quality selection
5 participants