BFM business video extraction fails #32608

theedge456 · 2023-10-18T16:37:57Z

Checklist

[ x] I'm reporting a broken site support issue
[ x] I've verified that I'm running youtube-dl version 2023.10.13
[ x] I've checked that all provided URLs are alive and playable in a browser
[ x] I've checked that all URLs and arguments with special characters are properly quoted or escaped
[ x] I've searched the bugtracker for similar bug reports including closed ones
[ x] I've read bugs section in FAQ

Verbose log

[debug] Command-line config: ['https://www.bfmtv.com/economie/replay-emissions/les-experts/les-experts-transition-verte-qui-doit-payer-13-10_VN-202310130321.html', '-v']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version stable@2023.10.13 [b634ba742] (zip)
[debug] Python 3.11.2 (CPython x86_64 64bit) - Linux-6.1.57-x86_64-with-glibc2.36 (OpenSSL 3.0.11 19 Sep 2023, glibc 2.36)
[debug] exe versions: ffmpeg 5.1.3-1 (setts), ffprobe 5.1.3-1
[debug] Optional libraries: Cryptodome-3.11.0, brotli-1.0.9, certifi-2022.09.24, mutagen-1.46.0, pyxattr-0.8.1, requests-2.28.1, sqlite3-3.40.1, urllib3-1.26.12
[debug] Proxy map: {}
[debug] Request Handlers: urllib
[debug] Loaded 1890 extractors
[bfmtv] Extracting URL: https://www.bfmtv.com/economie/replay-emissions/les-experts/les-experts-transition-verte-qui-doit-payer-13-10_VN-202310130321.html
[bfmtv] 202310130321: Downloading webpage
ERROR: [bfmtv] 202310130321: Unable to extract video block; please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
  File "/usr/bin/youtube-dl/yt_dlp/extractor/common.py", line 715, in extract
    ie_result = self._real_extract(url)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/bin/youtube-dl/yt_dlp/extractor/bfmtv.py", line 43, in _real_extract
    video_block = extract_attributes(self._search_regex(
                                     ^^^^^^^^^^^^^^^^^^^
  File "/usr/bin/youtube-dl/yt_dlp/extractor/common.py", line 1263, in _search_regex
    raise RegexNotFoundError('Unable to extract %s' % _name)

Description

The extraction fails since October 17th, 2023

The text was updated successfully, but these errors were encountered:

dirkf · 2023-10-19T10:48:40Z

The pattern used by the extractor to find the Brightcove video id is too restrictive:

--- old/youtube-dl/youtube_dl/extractor/bfmtv.py
+++ new/youtube-dl/youtube_dl/extractor/bfmtv.py
@@ -10,7 +10,7 @@
 class BFMTVBaseIE(InfoExtractor):
     _VALID_URL_BASE = r'https?://(?:www\.)?bfmtv\.com/'
     _VALID_URL_TMPL = _VALID_URL_BASE + r'(?:[^/]+/)*[^/?&#]+_%s[A-Z]-(?P<id>\d{12})\.html'
-    _VIDEO_BLOCK_REGEX = r'(<div[^>]+class="video_block"[^>]*>)'
+    _VIDEO_BLOCK_REGEX = r'(<div\s[^>]*\bclass\s*=\s*["\'](?:[\S]\s+)*video_block\b[^>]+>)'
     BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/%s_default/index.html?videoId=%s'
 
     def _brightcove_url_result(self, video_id, video_block):

Also there are some improvements from the yt-dlp version to add.

theedge456 · 2023-10-19T10:59:54Z

@dirkf
It's fixed now.
Thanks for the patch

dirkf · 2023-10-19T11:29:35Z

Thanks. I'll keep it open until the patch is committed.

mycodedoesnotcompile2 · 2023-11-01T12:51:29Z

When this patch will be merged ?

dirkf added broken-IE problem with existing site extraction patch-available labels Oct 19, 2023

theedge456 closed this as completed Oct 19, 2023

dirkf reopened this Oct 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BFM business video extraction fails #32608

BFM business video extraction fails #32608

theedge456 commented Oct 18, 2023 •

edited

dirkf commented Oct 19, 2023

theedge456 commented Oct 19, 2023

dirkf commented Oct 19, 2023

mycodedoesnotcompile2 commented Nov 1, 2023

BFM business video extraction fails #32608

BFM business video extraction fails #32608

Comments

theedge456 commented Oct 18, 2023 • edited

Checklist

Verbose log

Description

dirkf commented Oct 19, 2023

theedge456 commented Oct 19, 2023

dirkf commented Oct 19, 2023

mycodedoesnotcompile2 commented Nov 1, 2023

theedge456 commented Oct 18, 2023 •

edited