Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[youtube] fix: playlist #150

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

insaneracist
Copy link
Contributor

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Fixes #148
Quick hack that needs testing

@someziggyman
Copy link

Ok, so, good new is that it seems to be working with all playlist types.
Regular playlist ID:
PLszW2az_oxFd7dFeCb1FFhk7c_eEer5n1
Mix playlist ID:
RDKR9wGi7gVLQ
Search playlist:
https://www.youtube.com/results?search_query=linkin+park+numb
And channel playlist:
https://www.youtube.com/user/TheLinuxFoundation/playlists

There's also a new fix offered here:
#151
Will test it now. Hard to tell which one is best if it also works.

Copy link

@GitHildur GitHildur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for me 👍

@pukkandan
Copy link
Contributor

pukkandan commented Nov 10, 2020

I can confirm that this works at least for normal playlists and channels

Edit: Never mind, I see that it has already been reviewed :)

@insaneracist
Copy link
Contributor Author

insaneracist commented Nov 10, 2020

strange, this commit isn't showing up here. insaneracist@b2a462a

edit: that superfluous commit woke it up.

@blackjack4494
Copy link
Owner

[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 87
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 88
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 89
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 90
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 91
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 92
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 93
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 94
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 95
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 96
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 97
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 98
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 99
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 100
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 101
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 102
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 103

it's never ending 🤣

@blackjack4494 blackjack4494 marked this pull request as draft November 10, 2020 22:28
@blackjack4494
Copy link
Owner

converting this to draft for now.
As it turns out #151 works better. I experienced some issues here.
That does not mean this PR is obsolet.

@SoneeJohn
Copy link

[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 87
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 88
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 89
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 90
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 91
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 92
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 93
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 94
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 95
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 96
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 97
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 98
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 99
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 100
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 101
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 102
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 103

it's never ending 🤣

That's because those are mixes playlist they start with a prefix of RD, UL and PU

@someziggyman
Copy link

[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 87
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 88
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 89
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 90
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 91
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 92
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 93
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 94
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 95
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 96
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 97
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 98
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 99
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 100
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 101
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 102
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 103

it's never ending 🤣

it does end actually. Ran 3 tests and results are: 384, 402, 415

Copy link

@SoneeJohn SoneeJohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might not need this. Playlists with certain prefixes (known as mixed playlists) can sometimes contain a lot of pages. My suggestion would be to see if it's a mix and fetch just the first page and implement an argument to have the maximum number of fetches for a mix playlist.

See

def _extract_mix(self, playlist_id):
# The mixes are generated from a single video
# the id of the playlist is just 'RD' + video_id
ids = []
yt_initial = None
last_id = playlist_id[-11:]
for n in itertools.count(1):
url = 'https://www.youtube.com/watch?v=%s&list=%s' % (last_id, playlist_id)
webpage = self._download_webpage(
url, playlist_id, 'Downloading page {0} of Youtube mix'.format(n))
new_ids = orderedSet(re.findall(
r'''(?xs)data-video-username=".*?".*?
href="/watch\?v=([0-9A-Za-z_-]{11})&[^"]*?list=%s''' % re.escape(playlist_id),
webpage))
# if no ids in html of page, try using embedded json
if (len(new_ids) == 0):
yt_initial = self._get_yt_initial_data(playlist_id, webpage)
if yt_initial:
new_ids = self._extract_mix_ids_from_yt_initial(yt_initial)
# Fetch new pages until all the videos are repeated, it seems that
# there are always 51 unique videos.
new_ids = [_id for _id in new_ids if _id not in ids]
if not new_ids:
break
ids.extend(new_ids)
last_id = ids[-1]
url_results = self._ids_to_results(ids)

if playlist_id.startswith(('RD', 'UL', 'PU')):
if not playlist_id.startswith(self._YTM_PLAYLIST_PREFIX):
# Mixes require a custom extraction process,
# Youtube Music playlists act like normal playlists (with randomized order)
return self._extract_mix(playlist_id)
has_videos, playlist = self._extract_playlist(playlist_id)
if has_videos or not video_id:
return playlist

@blackjack4494
Copy link
Owner

Just don't give up on this yet.
If implemented like in #151 you will get proper downloading

[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading webpage
[download] Downloading playlist: RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading continuation page #1
[youtube:playlist] playlist RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading 186 videos
[download] Downloading video 1 of 186

As I am super tired, will merge #151 now so that there is at least a working version out there. Will take a look tomorrow again.

@insaneracist
Copy link
Contributor Author

insaneracist commented Nov 11, 2020

@blackjack4494, thanks, i was about to give up. the problem was not sending enough client information, it kept returning the initial piece of the playlist (but only for some types).

@insaneracist
Copy link
Contributor Author

insaneracist commented Nov 11, 2020

@SoneeJohn, the playlists starting with RDCLAK5uy_ are special-cased, the reason is that they are from Youtube Music and have a playlist url.
e.g. https://www.youtube.com/playlist?list=RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA
the otherwise dynamically generated RD mixes can't be accessed that way, and are fetched via video urls (they should be hitting a different function, YoutubePlaylistIE._extract_mix)
e.g. fails: https://www.youtube.com/playlist?list=RDG8sGmSEehi4
works: https://www.youtube.com/watch?v=G8sGmSEehi4&list=RDG8sGmSEehi4

@blackjack4494
Copy link
Owner

so what's the state on this one @insaneracist ?
Haven't had time yet to look into it but it seems this should handle the missing title and other metadata?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

youtube channel page downloads broken
6 participants