Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BFM business video extraction fails #32608

Open
theedge456 opened this issue Oct 18, 2023 · 4 comments
Open

BFM business video extraction fails #32608

theedge456 opened this issue Oct 18, 2023 · 4 comments
Labels
broken-IE problem with existing site extraction patch-available

Comments

@theedge456
Copy link

theedge456 commented Oct 18, 2023

Checklist

  • [ x] I'm reporting a broken site support issue
  • [ x] I've verified that I'm running youtube-dl version 2023.10.13
  • [ x] I've checked that all provided URLs are alive and playable in a browser
  • [ x] I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • [ x] I've searched the bugtracker for similar bug reports including closed ones
  • [ x] I've read bugs section in FAQ

Verbose log

[debug] Command-line config: ['https://www.bfmtv.com/economie/replay-emissions/les-experts/les-experts-transition-verte-qui-doit-payer-13-10_VN-202310130321.html', '-v']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version stable@2023.10.13 [b634ba742] (zip)
[debug] Python 3.11.2 (CPython x86_64 64bit) - Linux-6.1.57-x86_64-with-glibc2.36 (OpenSSL 3.0.11 19 Sep 2023, glibc 2.36)
[debug] exe versions: ffmpeg 5.1.3-1 (setts), ffprobe 5.1.3-1
[debug] Optional libraries: Cryptodome-3.11.0, brotli-1.0.9, certifi-2022.09.24, mutagen-1.46.0, pyxattr-0.8.1, requests-2.28.1, sqlite3-3.40.1, urllib3-1.26.12
[debug] Proxy map: {}
[debug] Request Handlers: urllib
[debug] Loaded 1890 extractors
[bfmtv] Extracting URL: https://www.bfmtv.com/economie/replay-emissions/les-experts/les-experts-transition-verte-qui-doit-payer-13-10_VN-202310130321.html
[bfmtv] 202310130321: Downloading webpage
ERROR: [bfmtv] 202310130321: Unable to extract video block; please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
  File "/usr/bin/youtube-dl/yt_dlp/extractor/common.py", line 715, in extract
    ie_result = self._real_extract(url)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/bin/youtube-dl/yt_dlp/extractor/bfmtv.py", line 43, in _real_extract
    video_block = extract_attributes(self._search_regex(
                                     ^^^^^^^^^^^^^^^^^^^
  File "/usr/bin/youtube-dl/yt_dlp/extractor/common.py", line 1263, in _search_regex
    raise RegexNotFoundError('Unable to extract %s' % _name)

Description

The extraction fails since October 17th, 2023

@dirkf
Copy link
Contributor

dirkf commented Oct 19, 2023

The pattern used by the extractor to find the Brightcove video id is too restrictive:

--- old/youtube-dl/youtube_dl/extractor/bfmtv.py
+++ new/youtube-dl/youtube_dl/extractor/bfmtv.py
@@ -10,7 +10,7 @@
 class BFMTVBaseIE(InfoExtractor):
     _VALID_URL_BASE = r'https?://(?:www\.)?bfmtv\.com/'
     _VALID_URL_TMPL = _VALID_URL_BASE + r'(?:[^/]+/)*[^/?&#]+_%s[A-Z]-(?P<id>\d{12})\.html'
-    _VIDEO_BLOCK_REGEX = r'(<div[^>]+class="video_block"[^>]*>)'
+    _VIDEO_BLOCK_REGEX = r'(<div\s[^>]*\bclass\s*=\s*["\'](?:[\S]\s+)*video_block\b[^>]+>)'
     BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/%s_default/index.html?videoId=%s'
 
     def _brightcove_url_result(self, video_id, video_block):

Also there are some improvements from the yt-dlp version to add.

@dirkf dirkf added broken-IE problem with existing site extraction patch-available labels Oct 19, 2023
@theedge456
Copy link
Author

@dirkf
It's fixed now.
Thanks for the patch

@dirkf
Copy link
Contributor

dirkf commented Oct 19, 2023

Thanks. I'll keep it open until the patch is committed.

@dirkf dirkf reopened this Oct 19, 2023
@mycodedoesnotcompile2
Copy link

When this patch will be merged ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broken-IE problem with existing site extraction patch-available
Projects
None yet
Development

No branches or pull requests

3 participants