Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix npo support #31976
base: master
Are you sure you want to change the base?
Fix npo support #31976
Changes from 12 commits
3b31478
b4776f2
fb2b4e2
9e1acb2
6328978
0c7261d
c409a8c
f76d58c
da3d1f4
5773681
29724e7
21eb451
0dc7d95
fb7b717
f9e59b0
8b1a7d9
34b5b20
4fc4238
28ba01f
eb6e396
d36d50f
d426a92
3b3d73c
4b24e5f
681b390
159f825
0cbcd1a
0ab79c3
c08f29f
28624cf
1ca4e68
4398f68
58d7a00
d4250c8
ad64f37
bc86c5f
4c90b2f
007bbea
a60972e
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naïvely, I wonder any streams returned with this request are not WV-encrypted, and what happens if other or no values are passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make this less ridiculous, it could be assigned to a variable at the start of the routine using this sort of formatting:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be re-worked like the previous
return
, with themerge_dicts(traverse_obj(metadata, dict_construction), dict_of_known_vars)
pattern.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not using the returned urlhandle to track redirection or errors:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._sort_formats(...)
should be called and will raise if there aren't any. If there is a way to identify that a stream has DRM, and given that unlike yt-dlp we're going to skip DRM formats, one could, eg, return a tuple(formats, number_of formats_seen)
and then compare the sum of the total formats againstlen(formats)
. Ifnot formats
and the values differ,self.report_drm()
can be called.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only way is to load a random page, JSON parse the
__NEXT_DATA__
part and get thebuildId
prop from there. But then we might as well have that 'random page' be the actual video page and skip the/_next/data/
download part as that object already contains thepoms_mid
.It is not great, but I think the only stable option is parsing
__NEXT_DATA__
part for thepoms_mid
like we initially did for the NPO Start webui. I have worked with nextjs for quite a while and they have never changed the__NEXT_DATA__
part as far as I know so should be relatively safe.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is the
_search_nextjs_data()
method if needed.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'll look into that and use it if I can make it work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, all the information that you might have got with the JSON API is in the Next.js hydration JSON in the page, including the build ID that is no longer of interest. I have this working but will update once other issues have been cleared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or use
utils.join_nonempty()
for this effect:'title'
,'subtitle'
->'title - subtitle'
'title'
,''
->'title'
''
,'subtitle'
->'subtitle'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a tight enough pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May this embedding occur in other pages (not vpro.nl)?
Are the second and up videos related (clips, trailers, etc), or is the case more like a series page with various episodes?
In the first case maybe skip the subsidiary videos; in the second normally return a
playlist
result whose entries are either theurl_result()
s of episode URLs constructed for each video, orinfo_dict
s extracted from the page.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the test video page, there is apparently a content video and a teaser video. The former can be detected because it's inside (preceded by)
<div class=grid>
.As far as I can see other pages that might list multiple videos are playlist pages like https://www.vpro.nl/programmas/tegenlicht/kijk/afleveringen.html or https://www.vpro.nl/programmas/tegenlicht/kijk/afleveringen.html or https://www.vpro.nl/programmas/tegenlicht/categorieen/wereld.html that don't include
data-media-id
s but just have links to programme episodes. Counterexamples welcome.