Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CBC News Videos Not Downloading Normally - Missing XML (404 Error) #32758

Open
5 tasks done
danepedersen opened this issue Mar 29, 2024 · 4 comments
Open
5 tasks done
Labels
broken-IE problem with existing site extraction yt-dlp working or fix available in yt-dlp

Comments

@danepedersen
Copy link

danepedersen commented Mar 29, 2024

Checklist

  • I'm reporting a broken site support
  • I've verified that I'm running youtube-dl version 2021.12.17
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar issues including closed ones

Verbose log

Error: [debug] Command-line config: ['https://www.cbc.ca/player/play/1.7159486', '-o', 'D:\\Downloaded Audio-Video Tracks\\ViaYouTubeDL\\cbc.ca\\Custom\\%(title)s-%(id)s.%(ext)s', '-o', 'D:/TheBreakdowne.mp4', '--verbose']
Error: [debug] Encodings: locale cp1252, fs utf-8, pref cp1252, out cp1252 (No VT), error cp1252 (No VT), screen cp1252 (No VT)
Error: [debug] yt-dlp version stable@2024.03.10 from yt-dlp/yt-dlp [615a84447] (win_exe)
Error: [debug] Python 3.8.10 (CPython AMD64 64bit) - Windows-10-10.0.22621-SP0 (OpenSSL 1.1.1k  25 Mar 2021)
Error: [debug] exe versions: ffmpeg 6.0-essentials_build-www.gyan.dev (setts), ffprobe 6.0-essentials_build-www.gyan.dev
Error: [debug] Optional libraries: Cryptodome-3.20.0, brotli-1.1.0, certifi-2024.02.02, mutagen-1.47.0, requests-2.31.0, sqlite3-3.35.5, urllib3-2.2.1, websockets-12.0
Error: [debug] Proxy map: {}
Error: [debug] Request Handlers: urllib, requests, websockets
Error: [debug] Loaded 1803 extractors
[cbc.ca:player] Extracting URL: https://www.cbc.ca/player/play/1.7159486
[ThePlatform] Extracting URL: http://link.theplatform.com/s/ExhSPC/media/guid/2655402169/1?mbr=true&formats=MPEG4,FLV,MP3#__youtubedl_smuggle=%7B%22force_smil_url%22%3A+true%7D
[ThePlatform] 1: Downloading SMIL data
Error: ERROR: [ThePlatform] 1: Unable to download XML: HTTP Error 404: Not Found (caused by <HTTPError 404: Not Found>)
Error:   File "yt_dlp\extractor\common.py", line 732, in extract
Error:   File "yt_dlp\extractor\theplatform.py", line 315, in _real_extract
Error:   File "yt_dlp\extractor\theplatform.py", line 36, in _extract_theplatform_smil
Error:   File "yt_dlp\extractor\common.py", line 1086, in download_content
Error:   File "yt_dlp\extractor\common.py", line 1050, in download_handle
Error:   File "yt_dlp\extractor\adobepass.py", line 1366, in _download_webpage_handle
Error:   File "yt_dlp\extractor\common.py", line 920, in _download_webpage_handle
Error:   File "yt_dlp\extractor\common.py", line 877, in _request_webpage
Error:   File "yt_dlp\extractor\common.py", line 864, in _request_webpage
Error:   File "yt_dlp\YoutubeDL.py", line 4101, in urlopen
Error:   File "yt_dlp\networking\common.py", line 115, in send
Error:   File "yt_dlp\networking\_helper.py", line 204, in wrapper
Error:   File "yt_dlp\networking\common.py", line 326, in send
Error:   File "yt_dlp\networking\_requests.py", line 351, in _send
Error: yt_dlp.networking.exceptions.HTTPError: HTTP Error 404: Not Found
An error occured

Description

When I try to download a video from the CBC News site (NOT CBC Gem), it now gives me a 404 error and says the XML file cannot be found. It was working until recently and I have reverted to using the m3u8 method unless and until this issue is resolved.

@dirkf
Copy link
Contributor

dirkf commented Mar 30, 2024

This looks like, and is, the problem reported in yt-dlp/yt-dlp#9534.

#30839 "if you were actually using yt-dlp ..."

However yt-dl also has the problem and we should be able to use the same solution. After revising a WIP PR from several years ago that was never merged to include some of the changes from the yt-dlp PR:

python -m youtube_dl -vF 'https://www.cbc.ca/player/play/1.7159486'
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-vF', u'https://www.cbc.ca/player/play/1.7159486']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: 8cd66b76f
[debug] Python 2.7.15 (CPython i686 32bit) - Linux-6.1.0-18-686-pae-i686-with-debian-12.5 - OpenSSL 1.1.1a  20 Nov 2018 - glibc 2.1.3
[debug] exe versions: ffmpeg 5.1.4-0, ffprobe 5.1.4-0
[debug] Proxy map: {}
[cbc.ca:player] 1.7159486: Downloading webpage
[ThePlatform] 2324188227929: Downloading SMIL data
[ThePlatform] 2324188227929: Downloading m3u8 information
[ThePlatform] 2324188227929: Downloading JSON metadata
[info] Available formats for 2324188227929:
format code                extension  resolution note
hls-program_audio-English  mp4        audio only [eng] 
hls-422                    mp4        320x180     422k , avc1.4d400d, 30.0fps, video only
hls-580                    mp4        320x180     580k , avc1.640029, 30.0fps, video only
hls-910                    mp4        640x360     910k , avc1.640029, 30.0fps, video only
hls-1350                   mp4        864x486    1350k , avc1.640029, 30.0fps, video only
hls-2120                   mp4        864x486    2120k , avc1.640029, 30.0fps, video only
hls-5640                   mp4        1920x1080  5640k , avc1.4d4028, 30.0fps, video only (best)
$ 

Rather than passing the ID from the URL directly to the video host, we have to extract a media ID from the hydration JSON in the video page.

@dirkf dirkf added broken-IE problem with existing site extraction yt-dlp working or fix available in yt-dlp labels Mar 30, 2024
@trainman261
Copy link

Yes, this is the exact same problem. I'll be updating the PR at yt-dlp in a bit, the same changes should be usable here as well.

@dirkf
Copy link
Contributor

dirkf commented Mar 30, 2024

I noticed that the video host URLs used by the site contain the query parameter formats=M3U,MPEG4,MP3 (used in the above test patch, too) rather than the olde worlde formats=MPEG4,FLV,MP3 in the extractor(s).

Maybe this is ignored, or maybe it'll start to cause a problem later if ThePlatform withdraws some level of FLV support.

@trainman261
Copy link

IIRC, formats=M3U,MPEG4,MP3 preferentially downloads the m3u manifest if available (which contains the information for streaming the individual chunks), whereas formats=MPEG4,FLV,MP3 preferentially downloads an MP4 file as a whole. For streaming, downloading it in chunks makes more sense - plus it offers the benefit that most of the M3U files have multiple streams specified for different bandwidths, allowing for the media player to chose the appropriate stream based on the current resolution and available bandwidth. For downloading, however, it makes more sense to download the MP4 as a whole, since that generally has better performance on in this case, since it avoids all the overhead of downloading often 100+ chunks (with associated I/O) and assembling them all back together afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broken-IE problem with existing site extraction yt-dlp working or fix available in yt-dlp
Projects
None yet
Development

No branches or pull requests

3 participants