You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As explained in #4902, the way streams are returned by the Plugin API and the way streams are offered to the user on the CLI for selection is far from ideal. This API and UI design has reached its limits, with an urgent need of change, so let's finally fix these flaws and re-implement everything in a clean way.
The following design draft will change major parts of Streamlink's APIs, as well as the stream selection/filtering and listing on the CLI.
⚠️ Early work-in-progress ideas, nothing concrete yet ⚠️
Going to lock this thread for now, to avoid comments which will be outdated while these design ideas are shaping up. If you have feedback to share, then please comment on the Gitter/Matrix channel. Thanks. 👍
Issues
Let's talk about the problems in detail first.
Stream selection and filtering
It's currently impossible to select or filter streams using more than one (or two) metadata attributes.
For example, selecting only streams above or below a certain frame-rate is impossible due to how the stream-weighting is implemented. It currently takes the video height and sums it up with the frame-rate number, making comparisons between different video resolutions impossible.
Selecting streams using a certain video- or audio-codec is also impossible, because this metadata is completely ignored and can't be made use of in the current implementation.
Stream return values
The main issue is how streams are returned by Plugin.streams() and what plugins return in their Plugin._get_streams() method (internally called by the public method). Most of the time, this data is just the forwarded result of HLSStream.parse_variant_playlist() or DASHStream.parse_manifest() though, so the issue doesn't just affect the Plugin class implementation.
Instead of returning all available streams and all possible sub-stream combinations for potential stream muxing, a dict[str, Stream] | list[tuple[str, Stream]] is returned (or Plugin._get_streams() acts as a Generator[tuple[str, Stream], None, None] which is then collected by Plugin.streams()), with str being a pre-determined stream name which references a specific Stream object.
Stream names
Those specific stream "names" are calculated beforehand, without any additional metadata that could describe the stream afterwards. Most of the time, streams are named after the vertical video resolution (sometimes with an optional audio stream bitrate), or just the overall stream bitrate.
This set of stream names is then processed by a stream-weighting algorithm which expects a specific name format and which only uses the "name data" for weighting, with an optional override logic in the plugin implementation.
After weighting the streams, stream name aliases like best, worst, best-unfiltered and worst-unfiltered are added as duplicates to the returned streams dictionary.
Lack of proper metadata
Since the stream names and their format is limited to only one or two stream metadata properties (video resolution or stream bitrate), this limits the way streams can be weighted.
For example, video frame-rates have never been weighted correctly, and filtering streams by video codec is still impossible, which is the main motivation of these changes.
Other metadata like the audio language for example is handled completely differently using special session/CLI options which the Stream implementations read before building the streams dictionary. The same issue applies to the stream type/protocol selection.
In addition to that, known metadata should always be made available to the user. Currently, only the stream name that's used for the streams dictionary key is shown to the user.
Stream muxing
Potential stream combinations for muxing are already pre-determined by the plugin or respective Stream implementation. This makes selecting certain combinations impossible.
While it kind of makes sense muxing only the video and audio streams on the same "quality levels" (best video and best audio, or worst video and worst audio), a plugin or Stream implementation should not have those decisions built in, and it should be up to the user to decide.
Stream duplicates
Plugins and Stream implementations also decide beforehand which streams are considered duplicates. Duplicates are then added as new items on the streams dictionary with the added _altN name suffix, with N being either "" or "2". If there are more streams which are considered duplicates, then those are simply discarded.
This is bad for multiple reasons:
It's not always clear which stream is a duplicate and which one is not, because not all metadata is being taken into consideration. Duplicates are only detected because a specific metadata attribute like the stream's NAME attribute for example collided with another one. This doesn't necessarily mean that a stream is indeed a duplicate.
The naming scheme doesn't communicate any useful information about the duplicate stream. The _altN suffix is meaningless.
Discarding additional streams limits the user's choice. For example, legitimate duplicates are often hosted on multiple different CDNs.
Just as a note, duplicates are bitstream duplicates ("quality duplicates"), not "content duplicates". Plugins should never provide different content streams for the same input URL (all stream results are therefore "content duplicates").
Solution
Stream metadata
Implement the StreamMetadataVideo | StreamMetadataAudio dataclasses that inherit from a StreamMetadata base class, with attributes for various stream metadata of each kind. Subtitle metadata is left out on purpose for now, but can be added later.
Optionally let StreamMetadataVideoAudio inherit from both the video and audio dataclasses, as a shorthand dataclass for streams with one video+audio substreams each (which don't require muxing).
Plugins should also be able to set custom stream metadata. Maybe another dataclass could be used for that, e.g. StreamMetadataCustom. CDN data could be annotated this way (with "human-readable" CDN names instead of DNS names derived from the stream URL).
Capture all available stream metadata and store this data on a new Stream.metadata attribute using these new dataclass objects.
Let Stream.metadata be of the type list[StreamMetadata], to support single streams which contain multiple video/audio substreams (pre-muxing), e.g. one video stream, two audio streams, as well as custom metadata in addition to regular stream metadata.
Which metadata we're going to add to the respective dataclasses depends on the metadata that is available in the supported streaming protocols. This still needs to be decided, but at least a name attribute should be required, so streams without known metadata can always be distinguished, e.g. simple HTTPStream objects or direct HLS media playlists with custom names. This name attribute could be derived from the video resolution by default if unset.
Implement a StreamCollection(list[Stream]) class, to indicate which streams can be muxed into a single output. Any subset of this stream collection can be turned into a MuxedStream output.
A muxed stream must always be the result of the user's stream selection from all available streams that could potentially be muxed, and not when building the streams dictionary in a plugin implementation or in the HLSStream/DASHStream class methods.
There are still lots of questions open here. For example where the selection takes place. This obviously can't be part of the CLI module.
Update HLSStream and DASHStream
Previously, both HLSStream.parse_variant_playlist() and DASHStream.parse_manifest() class methods returned a dict[str, Stream], filtered out substreams according to the language settings, and renamed/removed stream duplicates.
These class methods now must return a list[Stream | StreamCollection] with metadata set accordingly on each Stream object. It is not up to these methods to filter substreams or remove duplicates.
Update Plugin API
Previously, Plugin._get_streams() returned a dict[str, Stream] | list[tuple[str, Stream]], or the method acted as a Generator[tuple[str, Stream], None, None].
It now must return list[Stream | StreamCollection] or must be a Generator[Stream | StreamCollection, None, None].
Backward compatibility in Plugin._get_streams() could potentially be kept, with the old stream name being used as a fallback if the stream's metadata is missing or its name attribute is unset.
New stream weighting
With the addition of proper stream metadata, the weighting can now be much more sophisticated, with lots of weighting criteria, e.g. video resolution (width and/or height), video frame-rate and audio sampling-rate, stream/substream bitrate/bandwidth, codecs, language preference, etc.
How this will be done still needs to be figured out exactly. The main problem here is that we're not focusing on one metadata attribute anymore, like the video resolution for example, so this could become a bit challenging for a generic weighting algorithm that should work in all cases. It probably will have to be configurable because of this.
The weighting and filtering will be moved out of the Plugin.streams() method and will become a separate implementation. Input and output data formats are still unknown/undecided. Plugin.streams() could maybe return its data in a wrapper class which implements methods for weighting and filtering.
This further means that the {best,worst}{,-unfiltered} stream name aliases that were added to the streams dictionary after weighting will be gone (these will be part of the new filtering logic, see below).
CLI changes
Since the stream selection now doesn't operate on a set of stream names anymore, we can add a new user interface for selecting/filtering streams.
I'm just focusing on the CLI side here, but those filters will be passed to a new API, so the Plugin.streams() result can be filtered by Streamlink implementors.
Removal of old CLI arguments and session options
For the sake of having a new filtering interface, these following CLI arguments and respective session options will be removed:
--stream-types/--stream-priority
--stream-sorting-excludes
--hls-audio-select
New stream positional argument(s)
Previously, stream was a comma-separated nargs=? argument (an optional argument) which selected the stream/quality by name.
This will be turned into an nargs=* argument which defines an optional list of stream metadata filters.
If unset, then just like before, Streamlink will simply list the available streams. This listing will also see a complete overhaul. More about that later.
If set, then all available streams will be filtered according to the stream filters. Multiple filters act as a boolean-and operation. Commas in a single filter can still be interpreted as a boolean-or. More boolean logic and operators could be added later. I don't think this should be too complex though.
If no streams can satisfy the filters, then Streamlink will print an error message, but if one or more than one streams do satisfy the filters, then Streamlink will pick the best one according to its new weighting algorithm.
The best filter is now an actual filter keyword and not a stream name alias. If it's set, then no filtering will be done, so the old behavior can be kept. The worst filter could be used to reverse the list of weighted streams.
A change of the stream argument also requires a change of the --default-stream argument. Previously, it was just an alias for stream, to be used in config files. Now, it's a repeatable argument, so multiple filters can be set in config files, similar to how stream accepts multiple arguments (parsed tokens on the CLI).
Stream filter examples
# old "best" alias (a no-op now)
streamlink "URL" best
# implicit name=... (backward compatibility)
streamlink "URL" 1080p60
# explicit name=... (name is just one of the metadata attributes)
streamlink "URL" name=1080p60
# filter by video resolution (using different shorthand aliases)
streamlink "URL" res=1920x1080
streamlink "URL" height=1080
streamlink "URL" h=1080
# at least one condition must be true
streamlink "URL" source,1080p60
# all filters must be true
streamlink "URL" h=1080 fps=60
# filter operators (certain metadata attributes should support different units)
streamlink "URL""bandwidth>4096k""bw<=8M"
Stream list examples
Needs to be decided.
Requirements:
at least one row for each entry
numeric stream indices (in addition to a shorthand "name" attribute) - could even use sub-indices
stream metadata (verbosity could be configured using a CLI arg)
substreams for muxing (shouldn't include redundant information)
duplicates should be grouped together and show the varying attributes (CDNs, etc)
The text was updated successfully, but these errors were encountered:
As explained in #4902, the way streams are returned by the Plugin API and the way streams are offered to the user on the CLI for selection is far from ideal. This API and UI design has reached its limits, with an urgent need of change, so let's finally fix these flaws and re-implement everything in a clean way.
The following design draft will change major parts of Streamlink's APIs, as well as the stream selection/filtering and listing on the CLI.
Going to lock this thread for now, to avoid comments which will be outdated while these design ideas are shaping up. If you have feedback to share, then please comment on the Gitter/Matrix channel. Thanks. 👍
Issues
Let's talk about the problems in detail first.
Stream selection and filtering
It's currently impossible to select or filter streams using more than one (or two) metadata attributes.
For example, selecting only streams above or below a certain frame-rate is impossible due to how the stream-weighting is implemented. It currently takes the video height and sums it up with the frame-rate number, making comparisons between different video resolutions impossible.
Selecting streams using a certain video- or audio-codec is also impossible, because this metadata is completely ignored and can't be made use of in the current implementation.
Stream return values
The main issue is how streams are returned by
Plugin.streams()
and what plugins return in theirPlugin._get_streams()
method (internally called by the public method). Most of the time, this data is just the forwarded result ofHLSStream.parse_variant_playlist()
orDASHStream.parse_manifest()
though, so the issue doesn't just affect thePlugin
class implementation.Instead of returning all available streams and all possible sub-stream combinations for potential stream muxing, a
dict[str, Stream] | list[tuple[str, Stream]]
is returned (orPlugin._get_streams()
acts as aGenerator[tuple[str, Stream], None, None]
which is then collected byPlugin.streams()
), withstr
being a pre-determined stream name which references a specificStream
object.Stream names
Those specific stream "names" are calculated beforehand, without any additional metadata that could describe the stream afterwards. Most of the time, streams are named after the vertical video resolution (sometimes with an optional audio stream bitrate), or just the overall stream bitrate.
This set of stream names is then processed by a stream-weighting algorithm which expects a specific name format and which only uses the "name data" for weighting, with an optional override logic in the plugin implementation.
After weighting the streams, stream name aliases like
best
,worst
,best-unfiltered
andworst-unfiltered
are added as duplicates to the returned streams dictionary.Lack of proper metadata
Since the stream names and their format is limited to only one or two stream metadata properties (video resolution or stream bitrate), this limits the way streams can be weighted.
For example, video frame-rates have never been weighted correctly, and filtering streams by video codec is still impossible, which is the main motivation of these changes.
Other metadata like the audio language for example is handled completely differently using special session/CLI options which the
Stream
implementations read before building the streams dictionary. The same issue applies to the stream type/protocol selection.In addition to that, known metadata should always be made available to the user. Currently, only the stream name that's used for the streams dictionary key is shown to the user.
Stream muxing
Potential stream combinations for muxing are already pre-determined by the plugin or respective
Stream
implementation. This makes selecting certain combinations impossible.While it kind of makes sense muxing only the video and audio streams on the same "quality levels" (best video and best audio, or worst video and worst audio), a plugin or
Stream
implementation should not have those decisions built in, and it should be up to the user to decide.Stream duplicates
Plugins and
Stream
implementations also decide beforehand which streams are considered duplicates. Duplicates are then added as new items on the streams dictionary with the added_altN
name suffix, with N being either""
or"2"
. If there are more streams which are considered duplicates, then those are simply discarded.This is bad for multiple reasons:
NAME
attribute for example collided with another one. This doesn't necessarily mean that a stream is indeed a duplicate._altN
suffix is meaningless.Just as a note, duplicates are bitstream duplicates ("quality duplicates"), not "content duplicates". Plugins should never provide different content streams for the same input URL (all stream results are therefore "content duplicates").
Solution
Stream metadata
Implement the
StreamMetadataVideo | StreamMetadataAudio
dataclasses that inherit from aStreamMetadata
base class, with attributes for various stream metadata of each kind. Subtitle metadata is left out on purpose for now, but can be added later.Optionally let
StreamMetadataVideoAudio
inherit from both the video and audio dataclasses, as a shorthand dataclass for streams with one video+audio substreams each (which don't require muxing).Plugins should also be able to set custom stream metadata. Maybe another dataclass could be used for that, e.g.
StreamMetadataCustom
. CDN data could be annotated this way (with "human-readable" CDN names instead of DNS names derived from the stream URL).Capture all available stream metadata and store this data on a new
Stream.metadata
attribute using these new dataclass objects.Let
Stream.metadata
be of the typelist[StreamMetadata]
, to support single streams which contain multiple video/audio substreams (pre-muxing), e.g. one video stream, two audio streams, as well as custom metadata in addition to regular stream metadata.Which metadata we're going to add to the respective dataclasses depends on the metadata that is available in the supported streaming protocols. This still needs to be decided, but at least a
name
attribute should be required, so streams without known metadata can always be distinguished, e.g. simpleHTTPStream
objects or direct HLS media playlists with custom names. Thisname
attribute could be derived from the video resolution by default if unset.New stream muxing logic
Implement a
StreamCollection(list[Stream])
class, to indicate which streams can be muxed into a single output. Any subset of this stream collection can be turned into aMuxedStream
output.A muxed stream must always be the result of the user's stream selection from all available streams that could potentially be muxed, and not when building the streams dictionary in a plugin implementation or in the HLSStream/DASHStream class methods.
There are still lots of questions open here. For example where the selection takes place. This obviously can't be part of the CLI module.
Update HLSStream and DASHStream
Previously, both
HLSStream.parse_variant_playlist()
andDASHStream.parse_manifest()
class methods returned adict[str, Stream]
, filtered out substreams according to the language settings, and renamed/removed stream duplicates.These class methods now must return a
list[Stream | StreamCollection]
with metadata set accordingly on eachStream
object. It is not up to these methods to filter substreams or remove duplicates.Update Plugin API
Previously,
Plugin._get_streams()
returned adict[str, Stream] | list[tuple[str, Stream]]
, or the method acted as aGenerator[tuple[str, Stream], None, None]
.It now must return
list[Stream | StreamCollection]
or must be aGenerator[Stream | StreamCollection, None, None]
.Backward compatibility in
Plugin._get_streams()
could potentially be kept, with the old stream name being used as a fallback if the stream's metadata is missing or itsname
attribute is unset.New stream weighting
With the addition of proper stream metadata, the weighting can now be much more sophisticated, with lots of weighting criteria, e.g. video resolution (width and/or height), video frame-rate and audio sampling-rate, stream/substream bitrate/bandwidth, codecs, language preference, etc.
How this will be done still needs to be figured out exactly. The main problem here is that we're not focusing on one metadata attribute anymore, like the video resolution for example, so this could become a bit challenging for a generic weighting algorithm that should work in all cases. It probably will have to be configurable because of this.
The weighting and filtering will be moved out of the
Plugin.streams()
method and will become a separate implementation. Input and output data formats are still unknown/undecided.Plugin.streams()
could maybe return its data in a wrapper class which implements methods for weighting and filtering.This further means that the
{best,worst}{,-unfiltered}
stream name aliases that were added to the streams dictionary after weighting will be gone (these will be part of the new filtering logic, see below).CLI changes
Since the stream selection now doesn't operate on a set of stream names anymore, we can add a new user interface for selecting/filtering streams.
I'm just focusing on the CLI side here, but those filters will be passed to a new API, so the
Plugin.streams()
result can be filtered by Streamlink implementors.Removal of old CLI arguments and session options
For the sake of having a new filtering interface, these following CLI arguments and respective session options will be removed:
--stream-types
/--stream-priority
--stream-sorting-excludes
--hls-audio-select
New
stream
positional argument(s)Previously,
stream
was a comma-separatednargs=?
argument (an optional argument) which selected the stream/quality by name.This will be turned into an
nargs=*
argument which defines an optional list of stream metadata filters.If unset, then just like before, Streamlink will simply list the available streams. This listing will also see a complete overhaul. More about that later.
If set, then all available streams will be filtered according to the stream filters. Multiple filters act as a boolean-and operation. Commas in a single filter can still be interpreted as a boolean-or. More boolean logic and operators could be added later. I don't think this should be too complex though.
If no streams can satisfy the filters, then Streamlink will print an error message, but if one or more than one streams do satisfy the filters, then Streamlink will pick the best one according to its new weighting algorithm.
The
best
filter is now an actual filter keyword and not a stream name alias. If it's set, then no filtering will be done, so the old behavior can be kept. Theworst
filter could be used to reverse the list of weighted streams.A change of the
stream
argument also requires a change of the--default-stream
argument. Previously, it was just an alias forstream
, to be used in config files. Now, it's a repeatable argument, so multiple filters can be set in config files, similar to howstream
accepts multiple arguments (parsed tokens on the CLI).Stream filter examples
Stream list examples
Needs to be decided.
Requirements:
The text was updated successfully, but these errors were encountered: